Download presentation
Presentation is loading. Please wait.
1
PowerPC 604 Superscalar Microprocessor
IBM, Motorola, Apple
2
11/13
3
PPC604e Overview RISC PowerPC family PowerPC architecture :
32-bit effective (logical) addresses, 8, 16, and 32 bits integer data types, and floating-point data types of 32 and 64 bits (single- and double-precision, respectively). A superscalar processor : can issue four instructions Up to seven instructions can execute in parallel. 11/13
4
Overview: 604e has 7 units The 604e has seven parallel – independent execution units Floating-point unit (FPU) Branch processing unit (BPU) Condition register unit (CRU) Load/store unit (LSU) Three integer units (IUs): — Two single-cycle integer units (SCIUs) — One multiple-cycle integer unit (MCIU) 11/13
5
Three-stage pipelined floating-point unit (FPU)
Fully IEEE 754 compliant FPU Supports non-IEEE mode for time-critical operations Fully pipelined, single-pass double-precision design Two-entry reservation station to minimize stalls Thirty-two 64-bit FPRs for single- or double-precision operands 11/13
6
BPU & CRU BPU Branch Processing Unit with dynamic branch prediction
Two-entry reservation station Out-of-order execution through two branches 64-entry fully-associative branch target address cache (BTAC), 512-entry branch history table (BHT) Two bits per entry predictions Condition register unit (CRU) 11/13
7
Condition resolution takes time
8
Solution: Branch speculation
9
Branch History Table (BHT) Table of predictors
Each branch given predictor BHT is table of “Predictors” Could be 1-bit or more Indexed by PC address of Branch most schemes use at least 2 bit predictors Performance = ƒ(accuracy, cost of misprediction) Misprediction Flush Reorder Buffer In Fetch state of branch: Use Predictor to make prediction When branch completes Update corresponding Predictor Predictor 0 Branch PC Predictor 1 Predictor 7 11/13
10
BTB: Branch Address at Same Time as Prediction
Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Branch PC Predicted PC =? PC of instruction FETCH prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4) Only predicted taken branches and jumps held in BTB Next PC determined before branch fetched and decoded later: check prediction, if wrong kill instruction, update BPb 11/13
11
11/13
12
PPC604 Pipeline 11/13
13
PowerPC604 Pipeline overview
Instruction fetch (IF) — loads decode queue (DEQ) with instructions from I - cache and determines next instruction address Instruction decode (ID)— time-critical decoding on instructions in dispatch queue (DISQ). Instruction dispatch (DS)— up to 4 instructions dispatched – max – in order one per functional unit non- time-critical instructions decoding. determines when instruction can be dispatched to EX Units At end of DS, instructions and their operands are latched into the execution input latches or into unit’s reservation station. Rename registers and reorder buffer entries allocated 11/13
14
Execute (E), Complete (C), Writeback
instruction flow split among six execution units. Instructions enter execute from dispatch or reservation station. results written into rename buffer entry ; notifies complete stage • Complete (C) ensures correct machine state maintained ; monitors instructions in complete and execute stages. Instructions removed from reorder buffer (ROB) when complete Results written back from rename buffers to register at complete or writeback • Writeback (W) writes back results from rename buffers not written back during complete 11/13
15
604 Block Diagram – Internal Data paths
11/13
16
Reservation Stations & Result Buses
11/13
17
Execution Latencies 11/13
18
PPC604e Unit Pipeline Stages
11/13
19
Example 1: Instruction timing for Cache HIT
11/13
20
11/13
21
Example 1: Instruction Timing for cache Hit
Clock 1 2 3 4 5 6 7 8 9 10 11 0 AND Fet DQ DS EX C/WB 1 OR 2 FADD 3 FSUB 4 ADDC C 5 SUBFC 6 FMADD 7 FMSUB 8 XOR 9 NEG 10 FADDS 11 FSUBS 12 ADD 13 SUB 11/13
22
BTB: Branch Address at Same Time as Prediction
Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Branch PC Predicted PC =? PC of instruction FETCH prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4) Only predicted taken branches and jumps held in BTB Next PC determined before branch fetched and decoded later: check prediction, if wrong kill instruction, update BPb 11/13
23
No branch penalty; 4 OR is from target stream
Example 2 : Branch Taken with BTAC hit No branch penalty; 4 OR is from target stream Clock 1 2 3 4 5 6 7 8 9 10 0 AND Fet DQ DS EX C/WB 1 LD C/WB 2 ADD C 3 BC taken Waits for LD add 4 OR waits bc 5 CMP 6 LD 7 MULLI Fet Cycle 5: Because the branch is taken, the OR (4) instruction, which could otherwise write back in this cycle, stays in the complete stage and completes and writes back in the nextcycle. The CMP (5) Instruction also enters the complete stage; ld (6) and mulli (7) enter the second stages of the LSU and MCIU pipelines, respectively. Cycle 2: instructions 4 – 7 fetched from Target based on address from BTAC HIT Cycle 5: inst wait for LD to retire (WB) & retire with it 11/13
24
Example 2: Branch taken with BTAC HIT No penalty
11/13
25
Example 3: Branch taken, BTAC HIT, Icache MISS
11/13
26
Ex 4: Branch taken, BTAC Miss, correct at Decode stage
One clock penalty, to fetch target group (2,3,4,5) Correction at Decode includes branch on CR (flags), LR 11/13
27
Ex 5: Branch taken, BTAC Miss,
correct at Dispatch stage - 2 clock branch penalty 11/13
28
Example 6: Branch taken, BTAC Miss,
correct at Execute clock penalty 11/13
29
Class Example – real dependencies
1 ADD R1, R2, R3 ; R1 = R2 + R3 2 ADD R2, R1, R4 3 OR R3, R1, R4 4 SUB R3, R2, R3 5 FMUL F7, F5, F6 6 FSUB F8, F10, F7 7 AND R4, R1, R3 Clock 1 2 3 4 5 6 7 8 9 10 1ADD Fet Dq DS EX C/WB 2 ADD DQ 3 OR 4SUB 5 FMUL 6 FSUB 7 AND FET C 11/13
30
11/13
31
PPC604 Pipeline 11/13
32
Pipeline Details: Fetch Stage
Fetches instructions from I cache and loads decode queue (DEQ) Determines address of next instruction to be fetched. Keeps queue supplied with instructions for dispatch Instructions fetched from I cache in groups of four, from a cache block If only two instructions remain in the cache block, only two instructions are fetched. 11/13
33
next instruction fetch address:
Each stage offers candidate address to be fetched, latest stage has highest priority As a block is prefetched, branch target address cache (BTAC) and branch history table (BHT) searched with fetch address. If address is in BTAC, next instruction fetched from that address DECODE may indicate, based on BHT or an unconditional branch decode, that earlier BTAC prediction was incorrect BPU can indicate that a previous branch prediction, from the BTAC or DECODE was incorrect 11/13
34
Decode Stage Handles time-critical decoding of instructions in instruction buffer. Contains four-instruction buffer (DEQ); shifts one or two pairs of instructions into dispatch buffer as space becomes available. Branch correction predicts branches whose target is taken from the CTR or LR. Occurs if no CTR or LR updates are pending. 11/13
35
Dispatch Stage non–time-critical decoding of instructions supplied by decode determines which instructions can be dispatched source operands read from register file and dispatched to execute units dispatched instructions and their operands latched into reservation stations or execution unit input latches. Dispatched Instructions issued a position in 16-entry completion buffer Rename Buffer allocated to instruction if needed 11/13
36
Execute Stage Instruction passed to appropriate execution unit after fetch, decode, and dispatch. EX units have different latencies Floating-point unit has fully pipelined, three-stage execution unit EX units write results into appropriate rename buffer & notifies complete stage 11/13
37
Branch Mispredict / Exceptions ?
What if a branch instruction was mispredicted in an earlier Stage ? Instructions from mispredicted path flushed Fetching resumes at the correct address. If an instruction causes an exception, the execution unit reports the exception to the complete stage and continues executing instructions 11/13
38
Complete Stage maintains correct architectural machine state.
As instruction finish EX, their status is recorded in completion buffer (FIFO) entry. entries examined in order in which instructions dispatched. Retains program order, ensures instructions completed in order four entries examined during each cycle for writeback completion buffer is used to ensure a precise exception model. . 11/13
39
Write-Back Stage Write back results from rename buffers not written back by the complete stage. Each rename buffers has two read ports for write-back, corresponding to the two ports provided for write-back for the GPRs, FPRs, and CR. Two results can be copied from the write-back buffers to registers per clock cycle. 11/13
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.