Computer Architecture Lecture 3 Abhinav Agarwal Veeramani V.
Quick recap – Pipelining source: http://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/
Quick recap – Problems Data hazards Control Hazards Structural Hazards Dependent Instructions add r1, r2, r3 store r1, 0(r4) Control Hazards Branches resolution bnz r1, label label: sub r1, r2, r3 Structural Hazards IF ID/RF EX MEM WB IF ID/RF EX MEM WB IF ID/RF EX MEM WB IF ID/RF EX MEM WB IF ID/RF EX MEM WB
Data Hazards RAW hazard – Read after Write add r1, r2, r3 store r1, 0(r4) WAW hazard – Write after Write div r1, r3, r4 … add r1, r10, r5 WAR hazard – Write after Read Generally not relevant in simple pipelines IF ID/RF EX MEM WB IF ID/RF EX MEM WB
Remedies Bypass values (Data forwarding) RAW hazards are tackled this way Not all RAW hazards can be solved by forwarding. E.g.: Load delay, What about divide? What is the solution? Static compiler techniques IF ID/RF EX MEM WB IF ID/RF EX MEM WB
Can we do better? Execute independent executions out-of-order? What do we require for this? lw r4, 0(r6) #Cache miss - Takes time addi r5, r4, 0x20 and r10, r5, r19 xor r26, r2, r7 sub r20, r26, r2 Fetch more instructions... Instructions should be commited in-order Memory instructions? Is dependency clear?
The WAW hazard Is it unavoidable? What is the reason for such hazard? Register renaming More physical registers Logical registers mapped to physical registers available when the instruction is decoded
Control Hazard Branch delay slot bnz r1, label add r1, r2, r3 label: sub r1, r2, r3 Save one cycle stall. Fetch in the negative edge to save another. Deeper pipelines. Such static compiler techniques would not work. IF ID/RF EX MEM WB IF ID/RF EX MEM WB IF ID/RF EX MEM WB
What can be done? Predict if the branch will be taken or not History of each branch saved and prediction done accordingly. Example: Bimodal predictor Branch prediction is very important and complex these days due to some architectural innovations and some bottlenecks.
Bimodal predictor Entry: 2-bit saturating counters Index: least significant bits of the instruction address Prediction: Combinatorial Update: When branch is resolved
Remedies to Structural hazards Simplest solution: Increase resources, functional units (Silicon allows us to do this) Another solution: Pipeline the functional units Pipelining is not always possible/feasible.
Superscalar execution! Execute more than one instruction every cycle. Make better use of the functional units Fetch, commit more instructions every cycle.
Memory Organization in processors Caches inside the chip Faster – ‘Closer’ SRAM cells They contain recently-used data They contain data in ‘blocks’
Rational behind caches Principle of spatial locality Principle of temporal locality Replacement policy (LRU, LFU, etc.) Principle of inclusivity
References http://en.wikipedia.org/wiki/Hazard_(computer_architecture) http://www.csee.umbc.edu/~plusquel/611/slides/chap3_3.html