Advanced Micro Devices - Athlon Buddy Guest Mike Lewitt Bill McCorkle November 28, 2001.

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

CSCI 4717/5717 Computer Architecture
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
Pentium microprocessors CAS 133 – Basic Computer Skills/MS Office CIS 120 – Computer Concepts I Russ Erdman.
THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Processor Technology and Architecture
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
The Pentium 4 CPSC 321 Andreas Klappenecker. Today’s Menu Advanced Pipelining Brief overview of the Pentium 4.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
The AMD and Intel Architectures COMP Jamie Curtis.
Intel Pentium 4 Processor Presented by Presented by Steve Kelley Steve Kelley Zhijian Lu Zhijian Lu.
Computer Organization and Assembly language
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Prince Sultan College For Woman
How a Computer Processes Data Hardware. Major Components Involved: Central Processing Unit Types of Memory Motherboards Auxiliary Storage Devices.
Computer performance.
History – 2.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 27 – A Brief History of the Microprocessor.
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Pre-Pentium Intel Processors /
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
AMD Athlon 64 FX-55 PROCESSOR ARCHITECTURE
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
AMD Opteron Overview Michael Trotter (mjt5v) Tim Kang (tjk2n) Jeff Barbieri (jjb3v)
Comparing Intel’s Core with AMD's K8 Microarchitecture IS 3313 December 14 th.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
AMD K-6 Processor Evaluation. Registers AMD-K6 Registers General purpose registers Segment registers Floating point registers MMX registers EFLAGS register.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
EKT303/4 Superscalar vs Super-pipelined.
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
Lecture # 10 Processors Microcomputer Processors.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
Lecture 3 Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.
PipeliningPipelining Computer Architecture (Fall 2006)
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Protection in Virtual Mode
Computer Architecture
Visit for more Learning Resources
Phnom Penh International University (PPIU)
CS203 – Advanced Computer Architecture
Introduction to Pentium Processor
عمارة الحاسب.
Special Instructions for Graphics and Multi-Media
The Microarchitecture of the Pentium 4 processor
The Athlons x86 Architecture
Comparison of Two Processors
Computer Evolution and Performance
Lecture 3 (Microprocessor)
Presentation transcript:

Advanced Micro Devices - Athlon Buddy Guest Mike Lewitt Bill McCorkle November 28, 2001

RISC IA-64 IA-32 What Have We Seen So Far? Where is the Competition?

Overview of Today’s Events Company History Differences in AMD Athlon Architecture System Bus Macro vs. Micro Operations Floating Point Operations Branch Prediction Memory Management Comparing Processor Performance

AMDIntel May 1, 1969 – founded Semiconductor company A and AM Sign cross-licensing agreement 1987 AMD & Intel go to court 1992 Court awards full rights to AMD to produce AM386 Processor 1991 AM386 (breaks Intel Monopoly) 1993 AM AMD-K Athlon – 1 st 7 th Generation Processor July 18, 1968 – founded Semiconductor memory introduced introduced 1976 Sign cross-licensing agreement bit (on-board memory) bit Pentium 1998 Celeron & Pentium II

Architecture Summary AMD Approach Balanced approach to optimize processor performance (  IPC) and improving the operating frequency at the same time. Intel Approach Increased pipelining depth to handle more instructions which created loss in processor performance (  IPC). Solution: Compensated with much higher frequency to stay in competition. (=IPC)

Architecture Summary Overall Improvement to Performance Frequency Improvements Smaller Geometries Faster Transistors (“process shrinks”) Deeper Pipelines Fewer Gates Per Clock Cycle Work Per Clock Improvements Super scalar Architectures Dynamic Instruction Schedulers Larger On-Chip Caches Advanced Branch Prediction

Architecture Summary Clock Speed / EV6 Bus Designed with very high clock speeds in mind K7 has very deep buffers to enable those high clock speeds, offering up to 72 x86 instructions in-flight. Uses Rising Edge and Falling Edge Detection For Bus 100 MHz Clock  200 MHz Processor 133 MHz Clock  266 MHz Processor AMD vs. Intel comparing same clock

Architecture Summary EV6 Bus on AMD Athlon Scalable up to 200 MHz Yielding Effective frequency 400 MHz Multiprocessor support Highest bus bandwidth (1.60 GB/s) Intel using 133 MHz (1.01 GB/s)

AMD Athlon PIII

Architecture Summary Instruction Control Unit Holds 72 MOps Before Assignment (MOp = x86 instruction, therefore Athlon can have 72 “in-flight” instructions) P6 Only Holds 13 in-flight MOps

Architecture Summary Execution Ports AMD Has No Less Than 9 Intel Has 5 2 Dedicated to memory stores Enhanced Parallelism Inside Athlon

Micro-OPs / Macro-OPs Athlon has 3 parallel x86 instruction decoders translate into a Macro-Op of 72-entry ICU Uses 2 pipelines (Intel uses 1) -Decoding common instructions (direct path) -Decoding complex x86 instructions (vector path) Integer Scheduler is fed and holds max 15 M-Ops, representing 30 at a time Leads to 3 parallel integer execution units

Micro-OPs / Macro-OPs Athlon Decoders 3-Way Instruction Has 3 parallel decoding units Can handle any combination of instructions with any of it’s decoders that are “fully capable” decoders Handles Complex and Simple Instructions Intel Decoders Has 3 parallel decoding units 1 Complex 2 Simple Handles Complex / Simple / Simple

3DNOW! 3DNOW! (Athlon)SSE (Intel) Pipelines (parallel)22 Instructions (how wide)24 Effective Instructions per Cycle4*4 Registers Used3DNOW! / FPUNo FPU Every 4-wide Intel SSE instruction is actually 2 Athlon micro-ops *AMD takes advantage of rising edge as well as falling edge **SSE Cannot be used with MMX Registers MMX Developed When FPUs Not As Important

3DNOW! Each pipeline can do any instruction above. The second pipeline can do any instruction in any group except the group the first pipeline has chosen.

3DNOW! Conclusion of 3DNOW! Vs SSE Both have pairing restrictions SSE Separate Unit  implementation more difficult  program with more freedom MMX-add & prefetch-instructions slightly better for SSE Final Conclusion: DRAW

Full Architecture views AMD Athlon PIII

Looking at the ALUs

Floating Point Operations Fully pipelined FPU 3 ported parallel Floating Point Execution Units Pentium has 3 also, but are behind only one port FPU can execute two 80-bit extended Ops Intel can currently only execute one

Pipelining Differences Determining the length Execution rate of pipeline (ALU) Degree of Parallelism AMD Athlon Intel Pentium III Integer Pipeline Length Floating Point Pipeline length 15  25 (AMD-Athlon)

Branch Prediction Example: if (x > 0){ a=0; b=1; c=2; } d=3; When x>0 When x<0 Predicting x<0

Branch Prediction AMD Athlon Branch Target Buffer size of 2048 entries Branch History Table can store 4096 entries Intel Pentium III Dynamic Branch Predictor can store 512 entries Approximate Correct Branch Predictions AMD Athlon: 95% Intel Pentium III: 90-92%

Memory Management Level 2 Cache 512kB to 8 MB Rate of 1/3, 1/2, 2/3, 1/1 the clock frequency External to the CPU (Weakness of Athlon) Intel L2: 256kB ‘on-die’ Intel moving away from Slot1 and back to socket AMD will need to move to ‘on-die’ and socket connections to stay competitive Main push towards 0.18  -process Level 1 Cache 64kB data and instruction caches (4x Pentium III) Scalability

Which One Is Better? In the past (286, 386, 486) Performance = Frequency In Today’s World Performance = IPC * Frequency How else so we compare? Benchmarking

Software that performs different tasks to obtain comparisons between processors. Problems: Processor frequencies. Other processes already running. Types of programs Some programs are written to take advantage of certain architecture.

Photo Editing Software

Animation Software

3D Graphics Editor

3D Gaming

Various Benchmarks

Summary Past couple years, AMD and Intel have taken different approaches. We have gone over the main architectural differences. We have shown how they compare. It will be very interesting to see how the market plays out.

Questions?

References Gardner, Ryan. AMD employee CPU Specialist Hsieh, Paul. 7 th Generation CPU Comparisons. 11/30/00 Pabst, Thomas. The New Athlon Processor – AMD is Finally Overtaking Intel. 8/9/99 Pabst, Thomas. AMD Processors vs. Intel Processors – Facts and Lies. 10/12/00 Morgan, Rob. Power Mac G4 Dual 500 vs. Pentium 4 vs. Athlon. 1/08/01