Computer Architecture Lecture 4 17th May, 2006 Abhinav Agarwal Veeramani V.
Recap Simple Pipeline – hazards and solution Data hazards Static compiler techniques – load delay slot, etc. Hardware solutions – Data forwarding, out-of-order execution, register renaming Control hazards Static compiler techniques Hardware speculation through branch predictors Structural hazards Increase hardware resources Superscalar out-of-order execution Memory organisation May 17, 2006 EE Summer Camp '06
Memory Organization in processors Caches inside the chip Faster – ‘Closer’ SRAM cells They contain recently-used data They contain data in ‘blocks’ May 17, 2006 EE Summer Camp '06
Rational behind caches Principle of spatial locality Principle of temporal locality Replacement policy (LRU, LFU, etc.) Principle of inclusivity May 17, 2006 EE Summer Camp '06
Outline Instruction Level Parallelism Thread-level Parallelism Fine-Grain multithreading Simultaneous multithreading Sharable resources & Non-sharable resources Chip Multiprocessor Some design issues May 17, 2006 EE Summer Camp '06
Instruction Level Parallelism Overlap execution of many instructions ILP techniques try to reduce data and control dependencies Issue out-of-order independent instructions May 17, 2006 EE Summer Camp '06
Thread Level Parallelism Two different threads have more independent instructions Better utilization of functional units Multi-thread performance is improved drastically May 17, 2006 EE Summer Camp '06
A simple pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Superscalar pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Speculative execution May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Fine Grained Multithreading May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Simultaneous Multithreading May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Out of Order Execution May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
SMT pipeline May 17, 2006 EE Summer Camp '06 source: EV8 DEC Alpha Processor, (c) Intel
Resources – Replication required Program counters Register maps May 17, 2006 EE Summer Camp '06
Replication not required Register file (rename space) Instruction queue Branch predictor First and second level caches etc. May 17, 2006 EE Summer Camp '06
Chip multiprocessor Number of transistors going up Have more than one core on the chip These still share the caches May 17, 2006 EE Summer Camp '06
Some design issues Trade-off in choosing the cache size Power and performance Super pipelining trade-off Higher clock frequency and speculation penalty + Power Power consumption May 17, 2006 EE Summer Camp '06
Novel techniques for power Clock gating Run non-critical elements at a slower clock Reduce voltage swings (Voltage of operation) Sleep Mode/ Standby Mode Dynamic Voltage Frequency scaling May 17, 2006 EE Summer Camp '06