Better answers The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.

Slides:



Advertisements
Similar presentations
CS 7810 Lecture 22 Processor Case Studies, The Microarchitecture of the Pentium 4 Processor G. Hinton et al. Intel Technology Journal Q1, 2001.
Advertisements

ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Alpha Microarchitecture Onur/Aditya 11/6/2001.
THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
1 Microprocessor-based Systems Course 4 - Microprocessors.
April 27, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Putting it all together: Intel Nehalem Steve Ko Computer Sciences and Engineering University.
Intel Labs Labs Copyright © 2000 Intel Corporation. Fall 2000 Inside the Pentium ® 4 Processor Micro-architecture Next Generation IA-32 Micro-architecture.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar.
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
Alpha 21364: A Scalable Single-chip SMP
COMP Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate.
History of Microprocessor MPIntroductionData BusAddress Bus
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
ALPHA Introduction I- Stream ALPHA Introduction I- Stream Dharmesh Parikh.
COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA.
Alpha Supplement CS 740 Oct. 14, 1998
André Seznec Caps Team IRISA/INRIA 1 High Performance Microprocessors André Seznec IRISA/INRIA
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Chapter Overview Microprocessors Replacing and Upgrading a CPU.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 3) Jonathan Winter.
Computer Architecture: Multithreading (II) Prof. Onur Mutlu Carnegie Mellon University.
Redundant Multithreading Techniques for Transient Fault Detection Shubu Mukherjee Michael Kontz Steve Reinhardt Intel HP (current) Intel Consultant, U.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
The Alpha – Data Stream Matt Ziegler.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
ALPHA Introduction I- Stream
Prof. Onur Mutlu Carnegie Mellon University
Simultaneous Multithreading: Multiplying Alpha Performance
Lynn Choi School of Electrical Engineering
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Hyperthreading Technology
The Microarchitecture of the Pentium 4 processor
Computer Architecture Lecture 4 17th May, 2006
IA-64 Microarchitecture --- Itanium Processor
Comparison of Two Processors
Alpha Microarchitecture
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Peter Bannon Staff Fellow HP
Lecture 22: Multithreading
Presentation transcript:

Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury, Massachusetts Slides: 1998 Microprocessor Forum (Peter Bannon) and 1999 Microprocessor Forum (Joel Emer)

Better answers Higher Performance Lower Cost EV EV  m EV  m 0.18  m 21364EV EV  m Alpha Microprocessor Roadmap 21364EV  m First System Ship

Better answers Alpha Microprocessor  Architectural Features First “Out-of-Order” Alpha First “Out-of-Order” Alpha Four-wide superscalar Four-wide superscalar …  Performance World’s Fastest Microprocessor ( 11/17/99) World’s Fastest Microprocessor ( 11/17/99) 39 SPECINT95, Mhz 39 SPECINT95, Mhz – Intel Pentium 733 Mhz delivers 36 SPECINT95, 30 SPECFP95

Better answers Higher Performance Lower Cost EV EV  m EV  m 0.18  m 21364EV EV  m Alpha Microprocessor Roadmap 21364EV  m First System Ship

Better answers Alpha Goals  Leadership single stream performance Higher operating frequency Higher operating frequency Integrated memory interface Integrated memory interface  Leadership multiprocessor performance Integrated system / multiprocessor interface Integrated system / multiprocessor interface

Better answers Alpha Features  System-on-a-Chip Alpha core with enhancements Alpha core with enhancements Integrated L2 Cache Integrated L2 Cache Integrated memory controller Integrated memory controller Integrated network interface Integrated network interface  Fault-Tolerance Support for lock-step operation to enable high- availability systems. Support for lock-step operation to enable high- availability systems.

Better answers Memory Controller RAMBUSRAMBUS Chip Block Diagram Core 16 L1 Miss Buffers L2 Cache Address Out Address In Network Interface N S E W I/O 16 L1 Victim Buf 16 L2 Victim Buf 64K Icache 64K Dcache

Better answers Int Reg Map Branch Predictors Core FETCH MAP QUEUE REG EXEC DCACHE Stage: L2 cache1.5MB 6-Set Int Issue Queue (20) Exec 4 Instructions / cycle Reg File (80 ) Victim Buffer L1 Data Cache 64KB 2-Set FP Reg Map FP ADD Div/Sqrt FP MUL Addr 80 in-flight instructions plus 32 loads and 32 stores Addr Miss Address Next-Line Address L1 Ins. Cache 64KB 2-Set Exec Reg File (80 ) FP Issue Queue (15) Reg File (72 )

Better answers Integrated L2 Cache  1.5 MB  6-way set associative  16 GB/s total read/write bandwidth  16 Victim buffers for L1 -> L2  16 Victim buffers for L2 -> Memory  ECC SECDED code  12ns load to use latency

Better answers Integrated Memory Controller  Direct RAMbus High data capacity per pin High data capacity per pin 800 MHz operation 800 MHz operation 30ns CAS latency pin to pin 30ns CAS latency pin to pin  6 GB/sec read or write bandwidth  100s of open pages  Directory based cache coherence  ECC SECDED

Better answers Integrated Network Interface  Direct processor-to-processor interconnect  10 GB/second per processor  15ns processor-to-processor latency  Out-of-order network with adaptive routing  Asynchronous clocking between processors  3 GB/second I/O interface per processor

Better answers System Block Diagram 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO

Better answers Alpha Technology  0.18  m CMOS  MHz  volts  3.5 cm 2  6 Layer Metal  100 million transistors 8 million logic 8 million logic 92 million RAM 92 million RAM

Better answers Alpha Status  70 SPECint95 (estimated)  120 SPECfp95 (estimated)  RTL model running  Tapeout: Summer 2000

Better answers Summary: System on a Chip  Integrated L2 cache and memory controller outstanding single processor performance outstanding single processor performance  Integrated network interface high performance multi-processor systems high performance multi-processor systems scales to large number of processors scales to large number of processors

Better answers Higher Performance Lower Cost EV EV  m EV  m 0.18  m 21364EV EV  m Alpha Microprocessor Overview 21364EV  m First System Ship

Better answers Alpha Goals  Leadership single stream performance Higher operating frequency / better technology Higher operating frequency / better technology New microarchitecture New microarchitecture Integrated memory interface (like 21364) Integrated memory interface (like 21364)  Leadership multiprocessor performance Simultaneous Multithreading (with minimal change/cost) Simultaneous Multithreading (with minimal change/cost) Integrated system / multiprocessor interface (like 21364) Integrated system / multiprocessor interface (like 21364)

Better answers Alpha Technology Overview  Leading edge process technology – GHz 0.125µm CMOS 0.125µm CMOS SOI-compatible SOI-compatible Cu interconnect Cu interconnect low-k dielectrics low-k dielectrics  Chip characteristics ~1.2V Vdd ~1.2V Vdd ~250 Million transistors ~250 Million transistors

Better answers Alpha Architecture Overview  Enhanced out-of-order execution  8-wide superscalar  Large on-chip L2 cache  Direct RAMBUS interface  On-chip router for system interconnect  Glueless, directory-based, ccNUMA for up to 512-way multiprocessing for up to 512-way multiprocessing  4-way simultaneous multithreading (SMT)

Better answers Instruction Issue Reduced function unit utilization due to dependencies Time

Better answers Superscalar Issue Superscalar leads to more performance, but lower utilization Time

Better answers Predicated Issue Adds to function unit utilization, but results are thrown away Time

Better answers Chip Multiprocessor Limited utilization when only running one thread Time

Better answers Fine Grained Multithreading Intra-thread dependencies still limit performance Time

Better answers Simultaneous Multithreading Maximum utilization of function units by independent operations Time

Better answers Basic Out-of-order Pipeline Fetch Decode/ Map Queue Reg Read ExecuteDcache/ Store Buffer Reg Write Retire PC Icache Register Map Dcache Regs Thread-blind

Better answers SMT Pipeline Fetch Decode/ Map Queue Reg Read ExecuteDcache/ Store Buffer Reg Write Retire Icache Dcache PC Register Map Regs

Better answers Changes for SMT  Basic pipeline – unchanged  Replicated resources Program counters Program counters Register maps Register maps  Shared resources Register file (size increased) Register file (size increased) Instruction queue Instruction queue First and second level caches First and second level caches Translation buffers Translation buffers Branch predictor Branch predictor

Better answers Multiprogrammed workload

Better answers Decomposed SPEC95 Applications

Better answers Multithreaded Applications

Better answers Architectural Abstraction  1 Processor with 4 Thread Processing Units (TPUs)  Shared hardware resources TPU 0TPU1TPU2TPU3 IcacheTLBDcache Scache

Better answers System Block Diagram EV8 MIO EV8 MIO EV8 MIO EV8 MIO EV8 MIO EV8 MIO EV8 MIO EV8 MIO EV8 MIO 0123

Better answers Alpha Summary  Leadership single stream performance Higher operating frequency / better technology Higher operating frequency / better technology New microarchitecture New microarchitecture Integrated memory interface (like 21364) Integrated memory interface (like 21364)  Leadership multiprocessor performance Simultaneous Multithreading (with minimal changes/cost) Simultaneous Multithreading (with minimal changes/cost) Integrated system / multiprocessor interface (like 21364) Integrated system / multiprocessor interface (like 21364)

Better answers  Alpha Reuses microprocessor core Reuses microprocessor core System on a chip System on a chip  Alpha New microarchitecture New microarchitecture System on a chip System on a chip Simultaneous Multithreading Simultaneous Multithreading Maintain Performance Lead Beyond Y2K

Better answers My Current Research: Beyond 21464?  The Truth Project (w/ Joel Emer) Examines different microarchitectural issues Examines different microarchitectural issues  The Multinet Project (w/ Rick Kessler) Tightly-coupled multiprocessor networks Tightly-coupled multiprocessor networks  The Reliant Project (w/ Steve Reinhardt) Self-Checking Microprocessors using SMT, ISCA submission Self-Checking Microprocessors using SMT, ISCA submission  Asim (w/ VSSAD Labs) Performance Model for Alphas beyond Performance Model for Alphas beyond 21464