THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer.

Slides:



Advertisements
Similar presentations
CS 7810 Lecture 22 Processor Case Studies, The Microarchitecture of the Pentium 4 Processor G. Hinton et al. Intel Technology Journal Q1, 2001.
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Contents Even and odd memory banks of 8086 Minimum mode operation
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Intel Labs Labs Copyright © 2000 Intel Corporation. Fall 2000 Inside the Pentium ® 4 Processor Micro-architecture Next Generation IA-32 Micro-architecture.
The AMD K8 Processor Architecture December 14 th 2006.
Advanced Micro Devices - Athlon Buddy Guest Mike Lewitt Bill McCorkle November 28, 2001.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
Intel Pentium 4 Processor Presented by Presented by Steve Kelley Steve Kelley Zhijian Lu Zhijian Lu.
AMD K7 Processor Architecture
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.
By Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas This paper appears in: Micro, IEEE March/April 2011 (vol. 31 no. 2) pp 마이크로 프로세서.
Alpha 21364: A Scalable Single-chip SMP
1 Structure of Computer Systems Course 7 – examples of CPU implementations - Microprocessors.
Intel Pentium II Processor Brent Perry Pat Reagan Brian Davis Umesh Vemuri.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
The MIPS R10000 Superscalar Microprocessor Kenneth C. Yeager Nishanth Haranahalli February 11, 2004.
AMD Opteron Overview Michael Trotter (mjt5v) Tim Kang (tjk2n) Jeff Barbieri (jjb3v)
Comparing Intel’s Core with AMD's K8 Microarchitecture IS 3313 December 14 th.
ALPHA Introduction I- Stream ALPHA Introduction I- Stream Dharmesh Parikh.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
The Intel 86 Family of Processors
Computer Architecture System Interface Units Iolanthe II in the Bay of Islands.
AMD K-6 Processor Evaluation. Registers AMD-K6 Registers General purpose registers Segment registers Floating point registers MMX registers EFLAGS register.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
The Alpha – Data Stream Matt Ziegler.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Sun Microsystems’ UltraSPARC-IIi a Stunt-Free Presentation by Christine Munson Amanda Peters Carl Sadler.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Microprocessor Microarchitecture Memory Hierarchy Optimization Lynn Choi Dept. Of Computer and Electronics Engineering.
UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
PipeliningPipelining Computer Architecture (Fall 2006)
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Protection in Virtual Mode
Instruction Level Parallelism
Visit for more Learning Resources
PowerPC 604 Superscalar Microprocessor
Computer architectures M
Flow Path Model of Superscalars
Introduction to Pentium Processor
The Microarchitecture of the Pentium 4 processor
Comparison of Two Processors
Alpha Microarchitecture
Lecture 20: OOO, Memory Hierarchy
* From AMD 1996 Publication #18522 Revision E
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer

AMD Processor Roadmap Microprocessor Forum 1997 AMD-K6 ® AMD-K6 AMD-K6-2 Sharptooth 1H’972H’971H’982H’981H’99 Performance.25µ process 100 MHz Bus On-chip, Full Speed Backside L2 100 MHz Frontside L3 Superscalar MMX Technology 3DNow! Technology.35µ process MMX TM Technology Enhanced.25µ process Lower Power Higher Speeds.25µ process 100 MHz Bus 100 MHz Frontside L2 Superscalar MMX Technology 3DNow! TM Technology µP Forum ‘98 AMD-K7 TM

AMD-K7 ™ Processor Overview  Superior 7th Generation CPU Design  L eading Performance in Integer, Floating point, and Multimedia  Operating Frequencies of 500 MHz+ using 0.25  m Technology  High-speed Alpha™ EV6 Bus Technology  High-speed Backside Level 2 Cache Controller  Scalable Multiprocessing Architecture for Workstation and Server Markets  Processor Module for Standard Motherboard Form Factors  Optimized Chipsets, Motherboards and BIOS

AMD-K7 ™ Processor Architecture  Three Parallel x86 Instruction Decoders  9-issue Superscalar Microarchitecture Optimized for High Frequency  Dynamic Scheduling with Speculative, Out-of-Order Execution  2048-entry Branch Prediction Table & 12-entry Return Stack  3 Superscalar, Out-of-Order Integer Pipelines each Containing:  Integer Execution Unit  Address Generation Unit  3 Superscalar, Out-of-Order Multimedia Pipelines with 1-cycle throughput  FADD (4 cyc latency), MMX ALU (2 cyc latency), 3DNow!  FMUL (4 cyc latency), MMX ALU (includes Mul & MAC), 3DNow!  FSTORE  Level 1 64K I-Cache & 64K D-Cache, each 2-way Set Associative  Multi-level TLB (24/256-Entry I, 32/256-Entry D)

AMD-K7 ™ Processor Architecture (Con’t)  Two General Purpose 64-bit Load/Store Ports into D-Cache  3-Cycle Load Latency  Multi-banking Allows Concurrent Access by 2 Load/Stores  High-speed 64-bit Backside L2 Cache Controller  Supports Sizes of 512KB to 8MB  Programmable Interface Speeds  High-speed 64-bit System Interface  First Mainstream Systems to have a 200MHz Bus  Significant Headroom for Future  Deep Internal Buffering to Support Pipelines and External Interfaces  Up to 72 x86 instructions in-flight  32 outstanding load misses  15-entry integer scheduler  36-entry floating point scheduler

Microarchitecture Terminology  x86 Instructions are sent to one of two Decoding Pipelines:  DirectPath: Decodes common x86 instructions (1-15 byte lengths)  VectorPath: Decodes uncommon, complex x86 instructions  Decoding Pipelines can dispatch 3 MacroOps to Execution Unit Schedulers  Each MacroOp consists of one or two Operations (OPs)  OPs are issued to the execution units ADD EAX, EBX XOREAX, [EBX+8] AND[EBX], EAX 1 DirectPath MacroOp - 1 OP (ADD) - 1 OP (ADD) 1 DirectPath MacroOp - 1 OP (LOAD) - 1 OP (LOAD) - 1 OP (XOR) - 1 OP (XOR) 1 DirectPath MacroOp - 1 OP (LOAD/STORE) - 1 OP (LOAD/STORE) - 1 OP (AND) - 1 OP (AND)

Microarchitecture Pipeline Integer Pipeline Floating Point Pipeline

AMD-K7 ™ Processor Block Diagram Load / Store Queue Unit IEUAGU Instruction Control Unit (72-entry) Fetch/Decode Control 2-way, 64KB Data Cache 32-entry L1 TLB/256-entry L2 TLB 3-Way x86 Instruction Decoders FPU Register File (88-entry) FADD MMX 3DNow! FStore FMUL MMX 3DNow! IEU Integer Scheduler (15-entry) FPU Stack Map / Rename L2 SRAMs System Interface 2-way, 64KB Instruction Cache 24-entry L1 TLB/256-entry L2 TLB Predecode Cache Branch Prediction Table L2 Cache Controller Bus Interface Unit FPU Scheduler (36-entry) AGUIEUAGU

x86 Instruction Decoders

Integer Execution Units  Three Integer Execution Units (IEU)  Three Address Generation Unit (AGU)  15-entry Integer Scheduler  Full Out-of-Order Speculative Execution  Multiplier IEU1 Instruction Control Unit and Register Files Integer Multiply (IMUL) IEU0 AGU0 AGU1 IEU2 AGU2 MacroOPs Integer Scheduler (15-entry) MacroOPs

Superscalar Multimedia Execution Units  Three Superscalar Multimedia Execution Units  3-issue, Out-of-Order, Fully Pipelined Design  Separate Register File Instruction Control Unit FADD MMX ALU 3DNow! FADD MMX ALU 3DNow! FSTORE FMUL MMX ALU MMX Mul 3DNow! FMUL MMX ALU MMX Mul 3DNow! Stack Map Register Rename Scheduler (36-entry) FPU Register File (88-entry)

Load-Store Unit and Data Cache  Load Store Unit (LSU)  44-entry Load/Store queue  Data forwarding from stores to dependent loads  2-way, 64KB Dual-Ported Data Cache  MOESI coherency, 64 byte line size  32-entry L1 DTLB and 4-way, 256-entry L2 DTLB  3 sets of data cache tags Data Cache 2-way,64KB LSU 44-entry Result Busses from Core Operand Busses Store Data to BIU

System Interface Controller Internals I-Miss Address Buffers (2) D-Miss Address Buffers (6) Victim Address Buffers (8) Snoop Address Buffers (8 ) Write Address Buffers (4) D-Cache Snoop Tags I-Cache Snoop Tags L2 Internal Tags L2 Controller System Interface Controller

AGP PCI DRAM System Logic Forward Clocks AMD-K7 PROCESSOR L2 CACHE Snoops/SysCMD Requests 72b Data 72b System and L2 Cache Interfaces  Alpha EV6 Bus Protocol  Point-to-Point Topology with Clock Forwarding  Decoupled Address and Data Busses  72-bit Data Bus w/ ECC  Independent Address/Request Bus  Independent Snoop Bus  Up to 20 Outstanding Transactions per Processor  Scalable Multiprocessing  L2 Cache Interface  512KB to 8MB using Industry- Standard SRAMs  Programmable Interface Speeds  Low-voltage Signaling AMD-K7 PROCESSOR L2 CACHE Snoops/SysCMD Requests 72b Data 72b

AMD-K7 ™ Processor Infrastructure  Chipsets  Performance-optimized AMD-K7 chipsets are planned from both AMD and leading third-party vendors in 1999  Motherboards  High quality, performance-optimized AMD-K7 motherboards are planned from leading vendors at launch in 1H99  BIOS  Production BIOS are planned from all leading suppliers including AMI, Award and Phoenix  Mechanical  The AMD-K7 processor will utilize existing industry-standard physical/mechanical infrastructure components including cases, power supplies, fans, heat sinks, etc.

AMD-K7 ™ Processor Summary  Superior 7th Generation Processor Architecture  Advanced Processor Core Design  Leading Edge Frequencies: 500MHz+ using 0.25  m Technology  High Performance System Interface with low-voltage swing Point-to- Point Topology and Clock Forwarding Technology  Scalable Multiprocessing Architecture  AMD-K7 Processor Module, Chipsets, Motherboards  Leading Edge Silicon Technology  Fab 25 and Fab 30 Provide Volume Production Capacity