Presentation is loading. Please wait.

Presentation is loading. Please wait.

The AMD K8 Processor Architecture December 14 th 2006.

Similar presentations


Presentation on theme: "The AMD K8 Processor Architecture December 14 th 2006."— Presentation transcript:

1 The AMD K8 Processor Architecture December 14 th 2006

2 K7 vs K8 K7: 3 x86 decoding units, 3 integer units (ALU), 3 floating point units (FPU),128KB L1 cache K8: 3 decoders (16 bytes of instructions per clock cycle);  x86 instructions decoded into fixed length micro-operations (µOPs).  Complex instructions are decoded into 2 + µOps  FastPath: Certain µOPs are packed together  µOPs are then dispatched to the execution units.  3 Address Generation Units (AGU) for Loads and Stores  Three integer units (ALU): most µOps executed in one cycle, multiplication has a 3 cycles latency in 32 bits, and a 5 cycles latency in 64 bits  Three floating point units (FPU), that handle x87, MMX, 3DNow!, SSE and SSE2 instructions  Load/Store stage: The L1 is dual-ported, that means it can handle two 64 bits reads or writes each clock cycle

3 K8 Hammer Microarchitecture

4 K7 vs K8 Pipelines

5 K8 L1 and L2Cache The L1 cache CPUK8Athlon XPPentium 4 NorthwoodPentium 4 Prescott Size code : 64KB data : 64KB code : 64Ko data : 64KB TC : 12Kµops data : 8KB TC : 12Kµops data : 16KB Associativity code : 2 way data : 2 way TC : 8 way data : 4 way TC : 8 way data : 8 way Cache line size code : 64 bytes data : 64 bytes TC : n.a data : 64 bytes Write policyWrite Back Write Through Latency3 cycles 2 cycles4 cycles The L2 cache CPUK8Athlon XPPentium 4 NorthwoodPentium 4 Prescott Size 512KB (Newcastle) 1024KB (Hammer) 256 and 512KB512KB1024KB Associativity16 way 8 way Cache line size64 bytes Latency (given by manufacturer) ?8 cycles7 cycles11 cycles Bus width128 bits64 bits256 bits L1 relationshipexclusive inclusive

6 Exclusive vs Inclusive Cache Exclusive L1-L2 PositiveNegative L1 and L2 cache designs a cache line (instructions/data) is not persisted from L1 to L2 No constraint on the L2 size (it can be small). Total cache size is sum of the sub- level sizes. L2 performance impaired (latency) Need to use a Victim Buffer Inclusive L1-L2 PositiveNegative Duplicates the content of the L1 cache in the L2 Cache L2 performance improvedConstraint on the L1/L2 size ratio (relatively large L2) Total cache size may be smaller.

7 K8 Athlon 64

8 Athlon 64 Operating Modes

9

10

11

12 Opteron VS. Xeon


Download ppt "The AMD K8 Processor Architecture December 14 th 2006."

Similar presentations


Ads by Google