© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations.

Slides:



Advertisements
Similar presentations
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Advertisements

The University of Adelaide, School of Computer Science
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations.
Optimizing ARM Assembly Computer Organization and Assembly Languages Yung-Yu Chuang with slides by Peng-Sheng Chen.
© 2000 Morgan Kaufman Overheads for Computers as Components ARM instruction set zARM versions. zARM assembly language. zARM programming model. zARM memory.
CPU Review and Programming Models CT101 – Computing Systems.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
UEE072HM Linking HLL and ALP An example on ARM. Embedded and Real-Time Systems We will mainly look at embedded systems –Systems which have the computer.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
F28PL1 Programming Languages Lecture 5: Assembly Language 4.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
1 Chapter 7: Runtime Environments. int * larger (int a, int b) { if (a > b) return &a; //wrong else return &b; //wrong } int * larger (int *a, int *b)
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Choice for the rest of the semester New Plan –assembler and machine language –Operating systems Process scheduling Memory management File system Optimization.
Embedded Software Development. Page 2 Hardware and software architectures Hardware and software are intimately related: software doesn’t run without hardware;
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Embedded Computer Systems Chapter1: Embedded Computing Eng. Husam Y. Alzaq Islamic University of Gaza.
ARM Core Architecture. Common ARM Cortex Core In the case of ARM-based microcontrollers a company named ARM Holdings designs the core and licenses it.
Topic 8: Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Winter 2010 Prof. Ryan Kastner Dept. of Computer Science and.
Chapter 1 Algorithm Analysis
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
© 2000 Morgan Kaufman Overheads for Computers as Components Instruction sets zComputer architecture taxonomy. zAssembly language. ARM processor  PDA’s,
Instruction Set Design by Kip R. Irvine (c) Kip Irvine, All rights reserved. You may modify and copy this slide show for your personal use,
© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zDesigning embedded programs is more difficult and challenging.
Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
Computer Architecture Instruction Set Architecture Lynn Choi Korea University.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Software components. Representations of programs. Assembly.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zOptimizing for execution time. zOptimizing for energy/power. zOptimizing.
Chapter 5 Program Design and Analysis 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
4-1 Chapter 4 - The Instruction Set Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Introduction to Code Generation and Intermediate Representations
M. Mateen Yaqoob The University of Lahore Spring 2014.
Chapter 10 Code Optimization Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
© 2000 Morgan Kaufman Overheads for Computers as Components Architectures and instruction sets zComputer architecture taxonomy. zAssembly language.
F28PL1 Programming Languages Lecture 4: Assembly Language 3.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
CS2100 Computer Organisation
Chapter 12 Variables and Operators
CSL718 : VLIW - Software Driven ILP
ARM Assembly Programming
Instructions - Type and Format
Code Generation.
Optimizing Transformations Hal Perkins Autumn 2011
CSCE Fall 2013 Prof. Jennifer L. Welch.
Overheads for Computers as Components 2nd ed.
Optimizing Transformations Hal Perkins Winter 2008
CSCE Fall 2012 Prof. Jennifer L. Welch.
Overheads for Computers as Components 2nd ed.
Optimizing ARM Assembly
Overheads for Computers as Components 2nd ed.
Program Design and Analysis Chapter 5
ARM ORGANISATION.
Chapter 6 Programming the basic computer
CSc 453 Final Code Generation
Introduction to Optimization
Presentation transcript:

© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations. zInterpreters and just-in-time compilers.

© 2000 Morgan Kaufman Overheads for Computers as Components Compilation zCompilation strategy (Wirth): ycompilation = translation + optimization zCompiler determines quality of code: yuse of CPU resources; ymemory access scheduling; ycode size.

© 2000 Morgan Kaufman Overheads for Computers as Components Basic compilation phases HLL parsing, symbol table machine-independent optimizations machine-dependent optimizations assembly

© 2000 Morgan Kaufman Overheads for Computers as Components Statement translation and optimization zSource code is translated into intermediate form such as CDFG. zCDFG is transformed/optimized. zCDFG is translated into instructions with optimization decisions. zInstructions are further optimized.

© 2000 Morgan Kaufman Overheads for Computers as Components Arithmetic expressions a*b + 5*(c-d) expression DFG *- * + a b cd 5

© 2000 Morgan Kaufman Overheads for Computers as Components Arithmetic expressions, cont’d. ADR r4,a MOV r1,[r4] ADR r4,b MOV r2,[r4] MUL r3,r1,r2 DFG *- * + a b cd 5 ADR r4,c MOV r1,[r4] ADR r4,d MOV r5,[r4] SUB r6,r4,r5 MUL r7,r6,#5 ADD r8,r7,r3 code

© 2000 Morgan Kaufman Overheads for Computers as Components Control code generation if (a+b > 0) x = 5; else x = 7; a+b>0x=5 x=7

© 2000 Morgan Kaufman Overheads for Computers as Components 3 21 Control code generation, cont’d. ADR r5,a LDR r1,[r5] ADR r5,b LDR r2,b ADD r3,r1,r2 BLE label3 a+b>0x=5 x=7 LDR r3,#5 ADR r5,x STR r3,[r5] B stmtent label3 LDR r3,#7 ADR r5,x STR r3,[r5] stmtent...

© 2000 Morgan Kaufman Overheads for Computers as Components Procedure linkage zNeed code to: ycall and return; ypass parameters and results. zParameters and returns are passed on stack. yProcedures with few parameters may use registers.

© 2000 Morgan Kaufman Overheads for Computers as Components Procedure stacks proc1 growth proc1(int a) { proc2(5); } proc2 SP stack pointer FP frame pointer 5 accessed relative to SP

© 2000 Morgan Kaufman Overheads for Computers as Components ARM procedure linkage zAPCS (ARM Procedure Call Standard): yr0-r3 pass parameters into procedure. Extra parameters are put on stack frame. yr0 holds return value. yr4-r7 hold register values. yr11 is frame pointer, r13 is stack pointer. yr10 holds limiting address on stack size to check for stack overflows.

© 2000 Morgan Kaufman Overheads for Computers as Components Data structures zDifferent types of data structures use different data layouts. zSome offsets into data structure can be computed at compile time, others must be computed at run time.

© 2000 Morgan Kaufman Overheads for Computers as Components One-dimensional arrays zC array name points to 0th element: a[0] a[1] a[2] a = *(a + 1)

© 2000 Morgan Kaufman Overheads for Computers as Components Two-dimensional arrays zColumn-major layout: a[0,0] a[0,1] a[1,0] a[1,1] = a[i*M+j]... M N

© 2000 Morgan Kaufman Overheads for Computers as Components Structures zFields within structures are static offsets: field1 field2 aptr struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a; 4 bytes *(aptr+4)

© 2000 Morgan Kaufman Overheads for Computers as Components Expression simplification zConstant folding: y8+1 = 9 zAlgebraic: ya*b + a*c = a*(b+c) zStrength reduction: ya*2 = a<<1

© 2000 Morgan Kaufman Overheads for Computers as Components Dead code elimination zDead code: #define DEBUG 0 if (DEBUG) dbg(p1); zCan be eliminated by analysis of control flow, constant folding. 0 dbg(p1); 1 0

© 2000 Morgan Kaufman Overheads for Computers as Components Procedure inlining zEliminates procedure linkage overhead: int foo(a,b,c) { return a + b - c;} z = foo(w,x,y);  z = w + x + y;

© 2000 Morgan Kaufman Overheads for Computers as Components Loop transformations zGoals: yreduce loop overhead; yincrease opportunities for pipelining; yimprove memory system performance.

© 2000 Morgan Kaufman Overheads for Computers as Components Loop unrolling zReduces loop overhead, enables some other optimizations. for (i=0; i<4; i++) a[i] = b[i] * c[i];  for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; }

© 2000 Morgan Kaufman Overheads for Computers as Components Loop fusion and distribution zFusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j];  for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } zDistribution breaks one loop into two. zChanges optimizations within loop body.

© 2000 Morgan Kaufman Overheads for Computers as Components Loop tiling zBreaks one loop into a nest of loops. zChanges order of accesses within array. yChanges cache behavior.

© 2000 Morgan Kaufman Overheads for Computers as Components Loop tiling example for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,N); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii];

© 2000 Morgan Kaufman Overheads for Computers as Components Array padding zAdd array elements to change mapping into cache: a[0,0]a[0,1]a[0,2] a[1,0]a[1,1]a[1,2] before a[0,0]a[0,1]a[0,2] a[1,0]a[1,1]a[1,2] after a[0,2] a[1,2]

© 2000 Morgan Kaufman Overheads for Computers as Components Register allocation zGoals: ychoose register to hold each variable; ydetermine lifespan of variable in the register. zBasic case: within basic block.

© 2000 Morgan Kaufman Overheads for Computers as Components Register lifetime graph w = a + b; x = c + w; y = c + d; time a b c d w x y 123 t=1 t=2 t=3

© 2000 Morgan Kaufman Overheads for Computers as Components Instruction scheduling zNon-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. zIn pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands.

© 2000 Morgan Kaufman Overheads for Computers as Components Reservation table zA reservation table relates instructions/time to CPU resources. Time/instrAB instr1X instr2XX instr3X instr4X

© 2000 Morgan Kaufman Overheads for Computers as Components Software pipelining zSchedules instructions across loop iterations. zReduces instruction latency in iteration i by inserting instructions from iteration i-1.

© 2000 Morgan Kaufman Overheads for Computers as Components Software pipelining in SHARC zExample: for (i=0; i<N; i++) sum += a[i]*b[i]; zCombine three iterations: yFetch array elements a, b for iteration i. yMultiply a,b for iteration i-1. yCompute sum for iteration i-2.

© 2000 Morgan Kaufman Overheads for Computers as Components Software pipelining in SHARC, cont’d /* first iteration performed outside loop */ ai=a[0]; bi=b[0]; p=ai*bi; /* initiate loads used in second iteration; remaining loads will be performed inside the loop */ for (i=2; i<N-2; i++) { ai=a[i]; bi=b[i]; /* fetch for next cycle’s multiply */ p = ai*bi; /* multiply for next iteration’s sum */ sum += p; /* make sum using p from last iteration */ } sum += p; p=ai*bi; sum +=p;

© 2000 Morgan Kaufman Overheads for Computers as Components Software pipelining timing ai=a[i]; bi=b[i]; p = ai*bi; sum += p; ai=a[i]; bi=b[i]; p = ai*bi; sum += p; ai=a[i]; bi=b[i]; p = ai*bi; sum += p; time pipe iteration i-2iteration i-1iteration i

© 2000 Morgan Kaufman Overheads for Computers as Components Instruction selection zMay be several ways to implement an operation or sequence of operations. zRepresent operations as graphs, match possible instruction sequences onto graph. * + expressiontemplates *+ * + MULADD MADD

© 2000 Morgan Kaufman Overheads for Computers as Components Using your compiler zUnderstand various optimization levels (- O1, -O2, etc.) zLook at mixed compiler/assembler output. zModifying compiler output requires care: ycorrectness; yloss of hand-tweaked code.

© 2000 Morgan Kaufman Overheads for Computers as Components Interpreters and JIT compilers zInterpreter: translates and executes program statements on-the-fly. zJIT compiler: compiles small sections of code into instructions during program execution. yEliminates some translation overhead. yOften requires more memory.