Download presentation
Presentation is loading. Please wait.
1
Flow Path Model of Superscalars
I-cache Instruction Branch FETCH Flow Predictor Instruction Buffer DECODE Integer Floating-point Media Memory Memory Data EXECUTE Flow Reorder Register Buffer (ROB) Data COMMIT Flow Store D-cache Queue
2
Instruction Fetch Buffer
Unit Out-of-order Core Fetch buffer smoothes out the rate mismatch between fetch and execution neither the fetch bandwidth nor the execution bandwidth is consistent Fetch bandwidth should be higher than execution bandwidth we prefer to have a stockpile of instructions in the buffer to hide cache miss latencies. This requires both raw cache bandwidth + control flow speculation
3
Instruction Flow Bandwidth
4
Instruction Cache Basic
00 01 10 11 000 001 Row Decoder PC=..xxRRRCC00 111 Mutiplexer Instruction example: 4 instructions per cache line
5
Spatial Locality and Fetch Bandwidth
00 01 10 11 000 001 Row Decoder PC=..xxRRRCC00 111 Inst Inst Inst2 Inst3
6
Fetch Group Miss Alignment
00 01 10 11 000 001 Row Decoder PC=..xx 111 Inst Inst Inst2 Cycle i Cycle i+1 Inst3??
7
IBM RS/6000 Auto-alignment
1 2 3 255 mux T logic A1 A5 A9 A13 B1 B5 B9 B13 A2 A6 A10 A14 B2 B6 B10 B14 A3 A7 A11 A15 B3 B7 B11 B15 TLB hit control and buffer Odd Directory Sets A & B Even Instruction buffer network Interlock, dispatch, b r a n c h , execution, D I s t u i o + IFAR - 2-way set associative I-Cache, inst SRAM modules - 16 instruction per cache line (**What is a cache line?)
8
Instruction Decoding Issues
Primary tasks: Identify individual instructions Determine instruction types Detect inter-instruction dependences Two important factors: Instruction set architecture Width of parallel pipeline
9
Intel Pentium Pro Fetch/Decode Unit
x86 Macro-Instruction Bytes from IFU Instruction Buffer 16 bytes To Next Address Calc. uROM Decoder Decoder Decoder 1 2 Branch Address Calc. 4 uops 1 uop 1 uop uop Queue (6) Up to 3 uops Issued to dispatch
10
Predecoding in the AMD K5
From Memory 8 Instruction Bytes 64 Byte1 Byte2 Byte8 5 Bits • Predecode Logic 8 Instr. Bytes + 64 + 40 Predecode Bits I-Cache 16 Instr. Bytes + 128 + 80 Predecode Bits Decode, Translate and Dispatch ROP1 ROP2 ROP3 ROP4 Predecoding is also useful for RISC ISAs!! Cost: cache size, refill time Up to 4 ROP’s
11
Control Dependence
12
IBM’s Experience on Pipelined Processors [Agerwala and Cocke 1987]
Code Characteristics (dynamic) loads - 25% stores - 15% ALU/RR - 40% branches - 20% 1/3 unconditional (always taken) unconditional - 100% schedulable 1/3 conditional taken 1/3 conditional not taken conditional - 50% schedulable
13
Control Flow Graph Shows possible paths of control flow through basic blocks Control Dependence Node X is control dependant on Node Y if the computation in Y determines whether X executes
14
Mapping CFG to Linear Instruction Sequence
B C D C B D D B C
15
Branch Types and Implementation
Types of Branches Conditional or Unconditional? Subroutine Call (aka Link), needs to save PC? How is the branch target computed? Static Target e.g. immediate, PC-relative Dynamic targets e.g. register indirect Conditional Branch Architectures Condition Code ‘N-Z-C-V’ e.g. PowerPC General Purpose Register e.g. Alpha, MIPS Special Purposes register e.g. Power’s Loop Count
16
Condition Resolution
17
Target Address Generation
18
What’s So Bad About Branches?
Performance Penalties Use up execution resources Fragmentation of I-Cache lines Disruption of sequential control flow Need to determine branch direction (conditional branches) Need to determine branch target Robs instruction fetch bandwidth and ILP
19
Riseman and Foster’s Study
7 benchmark programs on CDC-3600 Assume infinite machine: Infinite memory and instruction stack, register file, fxn units Consider only true dependency at data-flow limit If bounded to single basic block, i.e. no bypassing of branches maximum speedup is 1.72 Suppose one can bypass conditional branches and jumps (i.e. assume the actual branch path is always known such that branches do not impede instruction execution) Br. Bypassed: Max Speedup:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.