Download presentation
Presentation is loading. Please wait.
Published byTrevin Stradford Modified over 9 years ago
2
Instruction Set Issues MIPS easy –Instructions are only committed at MEM WB transition Other architectures are more difficult –Instructions may update state early –FP more difficult –Memory updating ops (e.g. string moves)
3
Instruction Set Issues (cont.) Difficult architectural features –“Odd” bits of state (e.g. condition codes) May need saving/restoring on exceptions –Implicitly set condition codes Complicate branch resolution Explicit setting helps here (still a RAW hazard) –Multicycle operations Widely differing execution times, lots of potential data hazards, etc.
4
Instruction Set Issues VAX suffers from many of these problems Solution: pipeline the microcode Intel 32-bit 80x86 processors since 1995 use a similar approach
5
A.5. Handling Multicycle Operations MIPS: FP operations –Long latency (EX repeated) –Several functional units –Structural hazards –Data hazards
6
DLX: FP Design Four functional units: –Integer ALU as before –FP multiplier also used for integer multiplication –FP adder addition, subtraction and conversion –FP divider also used for integer division
7
MIPS Design with FP Units
8
MIPS Multicycle Operations UnitLatency Initiation Interval Integer ALU01 Memory (loads)11 FP add31 FP multiply61 FP divide2425
9
Hazards Divides –Structural hazard Multiple register writes possible in a cycle Out-of-order completion –WAW hazards –Exception-handling complications RAW hazards increase
10
Potential RAW Hazards Example (SPARC syntax): ldd [%fp-8], %f4 fmuld %f4, %f6, %f0 faddd %f0, %f8, %f2 std %f2, [%fp-16] Instr.1234567891011121314151617 ld FDXMW mul FDXXXXXXXMW add FDXXXXMW st FDXM
11
Multiple Writes Up to four instructions may need to write in the same cycle Solution –Track writes in ID –Stall at instruction issue Alternatively: –Stall at MEM or WB Stall instruction with shorter latency (may free RAW hazards) Simpler: all stalls at one point
12
WAW Hazards Example: faddd %f4, %f6, %f2 … ! Integer op ldd [%fp-8], %f2 Instr.12345678 faddd FDXXXXMW … FDXMW ldd FDXMW
13
WAW Hazards (cont.) Rare –Compiler scheduling may result in unlikely instruction sequences, so must be caught Solutions: –Stall issue of ldd –Prevent write by faddd
14
Maintaining Precise Exceptions Out-of-order completion: fdivd %f2, %f4, %f0 faddd %f10, %f8, %f10 fsubd %f12, %f14, %f12 Complete long before fdivd Sub may cause an exception after add is complete, but not div No longer precise
15
Maintaining Precise Exceptions It may be very difficult to handle exceptions precisely –E.g. the add has destroyed one of its operands! Four solutions: –Accept imprecise exceptions Needed for VM & IEEE FP Allow switching between precise and imprecise modes
16
Maintaining Precise Exceptions Solutions (cont.) –Buffer results until earlier instructions complete Buffers may grow very large, and extensive forwarding required History files: restore original register values Future files: store new register values –Software executes intervening instructions to get “up to date” before returning from exception
17
Maintaining Precise Exceptions Solutions (cont.) –Hybrid scheme Instructions are only issued when it is certain that preceding instructions will not cause an exception May require stalling the pipeline
18
Performance of the MIPS FP Pipeline Structural Hazards (divide unit) –Very low: 0-2 cycles per FP operation RAW hazards –Divide: 12-24 cycles, average 14.2 –Add: 0.7-2.3 cycles, average 1.7 –In general, about 0.5 × latency
19
Overall MIPS FP Performance Stalls per instruction –0.65-1.21 cycles –Average: 0.87 –82% from FP RAW hazards
20
A.6. Putting It All Together MIPS R4000 Pipeline 64-bit instruction set Eight stage pipeline –superpipelining –IF + IS: instruction fetch –RF: decode/register fetch –EX: execution –DF + DS + TC: data cache access –WB: write back
21
MIPS R4000 Pipeline Performance –Load delay: two cycles –Branch delay: three cycles Delayed branch (one cycle) Predict-not-taken strategy, with anulling Increased forwarding requirements –Three stages between EX and WB now
22
MIPS R4000 Pipeline Floating Point –Three functional units Divider, multiplier, adder Shared components (8 sub-units) –Latency: 2–112 cycles –Initiation rate: 1–111 cycles –Complicated stall handling
23
MIPS R4000 Pipeline Performance: –CPI between 1.2 and 2.8 for SPEC92 benchmarks –Average: 2.0 Integer: 1.54 FP: 2.48 –Integer apps: mainly branch delays –FP apps: mainly FP data hazard stalls (RAW)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.