Download presentation
Presentation is loading. Please wait.
Published byArron Fowler Modified over 9 years ago
1
The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture Lizy Kurian John Department of Electrical and Computer Engineering The University of Texas at Austin
2
Lizy Kurian John, LCA, UT Austin2 Somebody once said “Computers are dumb actors and compilers/programmers are the master playwrights.”
3
Lizy Kurian John, LCA, UT Austin3 Computer Architecture Basics ISAs RISC vs CISC Assembly language coding Datapath (ALU) and controller Pipelining Caches Out of order execution Hennessy and Patterson architecture books
4
Lizy Kurian John, LCA, UT Austin4 Basics ILP DLP TLP Massive parallelism SIMD/MIMD VLIW Performance and Power metrics Hennessy and Patterson architecture books ASPLOS, ISCA, Micro, HPCA
5
Lizy Kurian John, LCA, UT Austin5 The Bottomline Programming Language choice affects performance and power eg: Java Compilers affect Performance and Power
6
Lizy Kurian John, LCA, UT Austin6 A Java Hardware Interpreter Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001) This technique used by Nazomi Communications, Parthus (Chicory Systems) Java class file Native executable Fetch Hardware bytecode translator DecodeExecute bytecodes Native machine instructions
7
Lizy Kurian John, LCA, UT Austin7 HardInt Performance Hard-Int performs consistently better than the interpreter In JIT mode, significant performance boost in 4 of 5 applications.
8
Lizy Kurian John, LCA, UT Austin8 Compiler and Power A B D F C E A B D F A B D F C C E E DDG Peak Power = 3 Energy = 6 Peak Power = 2 Energy = 6 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4
9
Lizy Kurian John, LCA, UT Austin9 Valluri et al 2001 HPCA workshop Quantitative Study Influence of state-of-the-art optimizations on energy and power of the processor examined Optimizations studied Standard –O1 to –O4 of DEC Alpha’s cc compiler Four individual optimizations – simple basic-block instruction scheduling, loop unrolling, function inlining, and aggressive global scheduling
10
Lizy Kurian John, LCA, UT Austin10 Standard Optimizations on Power
11
Lizy Kurian John, LCA, UT Austin11 Somebody once said “Computers are dumb actors and compilers/programmers are the master playwrights.”
12
Lizy Kurian John, LCA, UT Austin12 A large part of modern out of order processors is hardware that could have been eliminated if a good compiler existed.
13
Lizy Kurian John, LCA, UT Austin13 Let me get more arrogant A large part of modern out of order processors was designed because computer architects thought compiler writers could not do a good job.
14
Lizy Kurian John, LCA, UT Austin14 Value Prediction Is a slap on your face Shen and Lipasti
15
Lizy Kurian John, LCA, UT Austin15 Value Locality Likelihood that an instruction’s computed result or a similar predictable result will occur soon Observation – a limited set of unique values constitute majority of values produced and consumed during execution
16
Lizy Kurian John, LCA, UT Austin16 Load Value Locality
17
Lizy Kurian John, LCA, UT Austin17 Causes of value locality Data redundancy – many 0s, sparse matrices, white space in files, empty cells in spread sheets Program constants – Computed branches – base address for jump tables is a run-time constant Virtual function calls – involve code to load a function pointer – can be constant
18
Lizy Kurian John, LCA, UT Austin18 Causes of value locality Memory alias resolution – compiler conservatively generates code – may contain stores that alias with loads Register spill code – stores and subsequent loads Convergent algorithms – convergence in parts of algorithms before global convergence Polling algorithms
19
Lizy Kurian John, LCA, UT Austin19 2 Extremist Views Anything that can be done in hardware should be done in hardware. Anything that can be done in software should be done in software.
20
Lizy Kurian John, LCA, UT Austin20 What do we need? The Dumb actor Or the The defiant actor – who pays very little attention to the script
21
Lizy Kurian John, LCA, UT Austin21 Challenging all compiler writers The last 15 years was the defiant actor’s era What about the next 15? TLP, Multithreading, Parallelizing compilers – It’s time for a lot more dumb acting from the architect’s side. And it’s time for some good scriptwriting from the compiler writer’s side.
22
The University of Texas at Austin Lizy Kurian John, LCA, UT Austin22 BACKUP
23
Lizy Kurian John, LCA, UT Austin23 Compiler Optimzations cc - Native C compiler on Dec Alpha 21064 running OSF1 operating system gcc – Used to study the effect of individual optimizations
24
Lizy Kurian John, LCA, UT Austin24 Std Optimizations Levels on cc -O0 – No optimizations performed -O1 – Local optimizations such as CSE, copy propagation, IVE etc -O2 – Inline expansion of static procedures and global optimizations such as loop unrolling, instruction scheduling -O3 – Inline expansion of global procedures -O4 – s/w pipelining, loop vectorization etc
25
Lizy Kurian John, LCA, UT Austin25 Std Optimizations Levels on gcc - O0 – No optimizations performed -O1 – Local optimizations such as CSE, copy propagation, dead-code elimination etc -O2 – aggressive instruction scheduling -O3 – Inlining of procedures Almost same optimizations in each level of cc and gcc In cc and gcc, optimizations that increase ILP are in levels -O2, -O3, and -O4 cc used where ever possible, gcc used used where specific hooks are required NOTE:
26
Lizy Kurian John, LCA, UT Austin26 Individual Optimizations Four gcc optimizations, all optimizations applied on top -O1 -fschedule-insns – local register allocation followed by basic-block list scheduling -fschedule-insns2 – Postpass scheduling done -finline-functions – Integrated all simple functions into their callers -funroll-loops – Perform the optimization of loop unrolling
27
Lizy Kurian John, LCA, UT Austin27 Some observations Energy consumption reduces when # of instructions is reduced, i.e., when the total work done is less, energy is less Power dissipation is directly proportional to IPC
28
Lizy Kurian John, LCA, UT Austin28 Observations (contd.) Function inlining was found to be good for both power and energy Unrolling was found to be good for energy consumption but bad for power dissipation
29
Lizy Kurian John, LCA, UT Austin29 MMX/SIMD Automatic usage of SIMD ISA still difficult 10+ years after introduction of MMX.
30
Lizy Kurian John, LCA, UT Austin30 Standard Optimizations on Power (Contd)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.