Presentation is loading. Please wait.

Presentation is loading. Please wait.

* From AMD 1996 Publication #18522 Revision E

Similar presentations


Presentation on theme: "* From AMD 1996 Publication #18522 Revision E"— Presentation transcript:

1 * From AMD 1996 Publication #18522 Revision E
07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered. ENEL515 AMD-K5 Processor From AMD 1996 Publication #18522 Revision E 1/18/2019 ENEL AMD on K5 Copyright M. Smith *

2 Compare 1978 and 1996 CISC processors
Want to compare Motorola CISC Processor (based on era 1978/81) with a AMD K5 CISC Processor (era 1996 CISC) Look at common features present between AMD K5 CISC and 21K DSP Comment on paper “Microprocessors outperform DSP 2:1 1/18/2019 ENEL AMD on K5 Copyright M. Smith

3 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
68332 Block Diagram -- CISC 1/18/2019 ENEL AMD on K5 Copyright M. Smith

4 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
68332 block detail 1/18/2019 ENEL AMD on K5 Copyright M. Smith

5 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
68K registers and ALU 1/18/2019 ENEL AMD on K5 Copyright M. Smith

6 Problems with CISC compatability
X86 CISC architecture -- dominant standard over many generation Backward compatibility involves inherent limitations of X86 CISC variable length instructions few general registers complex addressing mode Could make same comments to 68K CISC 1/18/2019 ENEL AMD on K5 Copyright M. Smith

7 K5 overcomes backwards X86 compatibility problems
Superscalar RISC core instruction predecoding improved cache branch prediction speculative execution out of order execution register renaming 1/18/2019 ENEL AMD on K5 Copyright M. Smith

8 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
1/18/2019 ENEL AMD on K5 Copyright M. Smith

9 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
64 bit data bus interface 64 bit data bus cache/burst oriented line refill for both instruction cache and data cache Cache refills as five clock cycles per cache line -- (cache line -- 4 instructions?) 1/18/2019 ENEL AMD on K5 Copyright M. Smith

10 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Cache Architecture Separate instruction and data caches Permits snooping and aliasing? Cache can be retained after context switches 8-K byte data cache -- two cache lines of data accessed in 1 cycle to overcome X86 bottleneck Uses MESI to maintain data coherency with other system caches to ensure valid reads. (Modified Exclusive Shared Invalid) Write-back cache updates (external) memory only when necessary to keep bus free 1/18/2019 ENEL AMD on K5 Copyright M. Smith

11 Innovative x86 instruction predecoding -- mean what
4th generation CPU -- 1 X86 instruction per cycle 5th generation CPU - 2 X86 instruction per cycle K X86 instructions per cycle 1/18/2019 ENEL AMD on K5 Copyright M. Smith

12 Innovative x86 instruction predecoding -- how done
Each byte is tagged with predecode information x86 instruction boundaries identified multiple x86 instructions aligned (length 8 to 120 bits) Aligned instructions are assigned issue positions for most efficient processing Predecode information also indicates number of ROPs needed After decoding stored in instruction cache Speculative instructions (from a predicted branch stream) are pushed to a byte queue for further decoding 1/18/2019 ENEL AMD on K5 Copyright M. Smith

13 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Turnings into ROPS 1/18/2019 ENEL AMD on K5 Copyright M. Smith

14 Unique x86 instruction conversion and decoding
32 bytes of precoded X86 instructions forwarded to decoder Decoder converts complex x86 to ROPS -- fixed length easy to process Simultaneous operands for ROP fetched from register files or re-order buffer X86 instructions are scanned and allocated to a decode position. Number of ROPS to X86 is known during predecoding -- saves time -- why? 1/18/2019 ENEL AMD on K5 Copyright M. Smith

15 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Stage II If X86 instruction requires less than 4 ROPS after conversion then goes “fast path” to any of 4 decode positions Very complex X86 instructions are transferred to microcode ROM for conversion After decoding, ROPS send to reservation stations at the 6 execution units. At execution units, ROPS may be executed out of order -- faster than compiler optimizations ROPs wait in reservation stations for operands from register file, data cache or result of other ops 1/18/2019 ENEL AMD on K5 Copyright M. Smith

16 K5 Superscalar RISC core
Six execution units two ALU (integer) two load/store units branch unit floating point FPU Conversions of variable length X86 to simple fixed length RISC operations (ROPs) Dispatch four ROPs at a time to superscalar core Execution rate -- peak at 6 ROPS per cycle Register forwarding and data bypassing allows results to be used immediately in next ROP -- no delay of results to destination register and then out again 1/18/2019 ENEL AMD on K5 Copyright M. Smith

17 6 Parallel Execution Units
1/18/2019 ENEL AMD on K5 Copyright M. Smith

18 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Out of order execution Eliminated delays due to pipeline dependencies Each execution unit has 2 reservations stations for ROP Instructions can be issued in any order from reservation stations Execution unit act independently -- some work if others stall 16-entry reorder buffer keeps track of original instruction sequence 1/18/2019 ENEL AMD on K5 Copyright M. Smith

19 Write-read dependency problem for out-of order execution
1/18/2019 ENEL AMD on K5 Copyright M. Smith

20 Write-write dependency problem for out of order execution
1/18/2019 ENEL AMD on K5 Copyright M. Smith

21 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Register renaming Original X86 architecture has only 8 general purpose registers Increases register reuse (load and stores to memory) and register dependencies Register re-use overcome with multiple load/store execution units and dual-ported data cache Register renaming overcomes register dependencies -- multiple logical registers for each physical register allow execution units to use the same physical register names simultaneously 1/18/2019 ENEL AMD on K5 Copyright M. Smith

22 Register renaming -- code
1/18/2019 ENEL AMD on K5 Copyright M. Smith

23 Register renaming -- diagram
1/18/2019 ENEL AMD on K5 Copyright M. Smith

24 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Branch Prediction Branches in X86 programs every 7 instructions on average Processor predicts which branch to take Prediction done dynamically 75% accuracy, 1024 branch targets are cached If invalid prediction then minimal 3 cycle mispredict penalty Dynamic branch prediction enables instructions to be fed to execution core and eliminated pipeline bubbles (stalls?) 1/18/2019 ENEL AMD on K5 Copyright M. Smith

25 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Branch Prediction 1/18/2019 ENEL AMD on K5 Copyright M. Smith

26 Without prediction --time to find instruction in cache
1/18/2019 ENEL AMD on K5 Copyright M. Smith

27 Without prediction --time to find instruction in cache
1/18/2019 ENEL AMD on K5 Copyright M. Smith

28 With prediction faster throughput
1/18/2019 ENEL AMD on K5 Copyright M. Smith

29 Re-order buffer and register file
1/18/2019 ENEL AMD on K5 Copyright M. Smith

30 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Reorder buffer Reorder buffer -- key to speculative out of order execution (issue and completion) Reorder buffer used to rename registers, provide forwarding of requested intermediate results, recover from mispredictions Reorder buffer keeps track of original instruction sequence and ensures that results are retired in correct order with results going to register file If branch is mis-predicted then results of instructions are invalidated in re-order buffer before having affect on x86 registers or memory 1/18/2019 ENEL AMD on K5 Copyright M. Smith

31 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Register file X86 architecture has limited number of general purpose registers Fewer registers means frequeent reuse of registers and reduction in performance. -- uses register renaming to avoid this problems Movement between registers and memory is unavoidable with x86 instruction set. K5-CPU has a single cycle load from data cache. Also multi-ported register file, renaming in the reorder buffer -- near optimal speculative performance 1/18/2019 ENEL AMD on K5 Copyright M. Smith

32 ENEL515 -- AMD on K5 Copyright M. Smith smith@enel.ucalgary.ca
Quote from AMD The right combination Compatibility with performance 1/18/2019 ENEL AMD on K5 Copyright M. Smith


Download ppt "* From AMD 1996 Publication #18522 Revision E"

Similar presentations


Ads by Google