10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems
Superscalar Pipelines Superscalar pipelines can execute multiple instructions at once –2+ instructions in any stage of the pipeline Some processors allow 8 instructions to be issued at once Most programs can only take advantage of 1 or 2 issue slots
Out-of-Order Execution Allows you to execute any instruction that you can Enables more issue slots to be filled Often out-of-order execution, but in- order commit –that is, write back results in the order they should have occurred Note: IA-64 is in-order
Longer Pipelines Pipelines are getting longer –original RISC pipelines had 5 stages –pipelines now have up to 20 stages Allows the clock cycle to be very fast Okay as long as you can accurately predict branches (or get rid of them)
Speculation Prediction –better branch predictors (95% accurate) –predict many levels of branches –predict variable values –predict load addresses Simultaneously execute both paths of a branch Execute instructions even if there could be a dependency –sw after lw could be the same address, but probably not –let the sw execute and then fix it if you were wrong
Predicated Execution Predicated execution allows conditional moves and conditional adds instead of only conditional branches Avoids branches, which are bad because pipelines are so long IA-64 almost everything in IA-64 is predicated (many 1-bit predicate registers) HW problem with movn and movz was an example of this
VLIW Long Instruction Words (LIW) and Very Long Instruction Words (VLIW) –each instruction contains multiple smaller instructions that execute in parallel –(V)LIW instructions can be 128 to 1024 bits long and contain 3 to 16 instructions It's the compiler's job to find independent instructions to execute
Register Windows Saving registers on the stack during procedure call hurts performance Register windows use a stack of registers that are allocated to a procedure as it needs it Local Name Actual Name... r76 r75 r74 r73 t2r72 t1r71 t0r70 t1r69 t0r68 t2r67 t1r66 t0r65... Foo() Bar() Baz()
Smarter Compilers VLIW requires good compilers Predicated execution and speculation needs help from the compiler Old architectures had instructions to emulate high-level constructions (bad) New architectures provide many general instructions and instruction options IA-64 will keep compiler writers busy for a decade
Multiple CPUs on a Chip Chip multiprocessors –multiple simple CPUs, but share a cache –can run multiple programs simultaneously –single programs are no faster –like a multiprocessor machine but cheaper Simultaneous Multithreading (SMT) –more complex CPUs –like chip multiprocessors + superscalar + out-of- order –also improves single program performance –developed at UW –memory bandwidth is an issue for both
Funky Hardware on a Chip We can squeeze more and more transistors on a chip What do we do with them? Bigger caches (boring) Put programmable hardware on the CPU –FPGAs can be (re)programmed quickly –hardware runs 1000X faster than software Graphics specific hardware Instruction Co-Processors Simultaneously run two copies of all programs to avoid hardware glitches
Low Power CPUs are being put in everything, even devices that have very small batteries (tiny sensors) Need to make CPUs that use very little power (only as much as they need) –reduce the CPU clock frequency –allow the OS to turn off part of the chip Transmeta is building chips that emulate Intel x86, but with less power
Time to Market It used to be solely about being the fastest Now being adequate is enough Being the first technology to fill a need is the most important