Download presentation
Presentation is loading. Please wait.
1
The AMD K8 Processor Architecture December 14 th 2006
2
K7 vs K8 K7: 3 x86 decoding units, 3 integer units (ALU), 3 floating point units (FPU),128KB L1 cache K8: 3 decoders (16 bytes of instructions per clock cycle); x86 instructions decoded into fixed length micro-operations (µOPs). Complex instructions are decoded into 2 + µOps FastPath: Certain µOPs are packed together µOPs are then dispatched to the execution units. 3 Address Generation Units (AGU) for Loads and Stores Three integer units (ALU): most µOps executed in one cycle, multiplication has a 3 cycles latency in 32 bits, and a 5 cycles latency in 64 bits Three floating point units (FPU), that handle x87, MMX, 3DNow!, SSE and SSE2 instructions Load/Store stage: The L1 is dual-ported, that means it can handle two 64 bits reads or writes each clock cycle
3
K8 Hammer Microarchitecture
4
K7 vs K8 Pipelines
5
K8 L1 and L2Cache The L1 cache CPUK8Athlon XPPentium 4 NorthwoodPentium 4 Prescott Size code : 64KB data : 64KB code : 64Ko data : 64KB TC : 12Kµops data : 8KB TC : 12Kµops data : 16KB Associativity code : 2 way data : 2 way TC : 8 way data : 4 way TC : 8 way data : 8 way Cache line size code : 64 bytes data : 64 bytes TC : n.a data : 64 bytes Write policyWrite Back Write Through Latency3 cycles 2 cycles4 cycles The L2 cache CPUK8Athlon XPPentium 4 NorthwoodPentium 4 Prescott Size 512KB (Newcastle) 1024KB (Hammer) 256 and 512KB512KB1024KB Associativity16 way 8 way Cache line size64 bytes Latency (given by manufacturer) ?8 cycles7 cycles11 cycles Bus width128 bits64 bits256 bits L1 relationshipexclusive inclusive
6
Exclusive vs Inclusive Cache Exclusive L1-L2 PositiveNegative L1 and L2 cache designs a cache line (instructions/data) is not persisted from L1 to L2 No constraint on the L2 size (it can be small). Total cache size is sum of the sub- level sizes. L2 performance impaired (latency) Need to use a Victim Buffer Inclusive L1-L2 PositiveNegative Duplicates the content of the L1 cache in the L2 Cache L2 performance improvedConstraint on the L1/L2 size ratio (relatively large L2) Total cache size may be smaller.
7
K8 Athlon 64
8
Athlon 64 Operating Modes
12
Opteron VS. Xeon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.