Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
LOGO.  Concept:  Is read-only memory.  Do not lose data when power is lost.  ROM memory is used to produce chips with integrated.
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer.
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Intel Labs Labs Copyright © 2000 Intel Corporation. Fall 2000 Inside the Pentium ® 4 Processor Micro-architecture Next Generation IA-32 Micro-architecture.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
EECC756 - Shaaban #1 lec # 13 Spring Scalable Distributed Memory Machines Goal: Parallel machines that can be scaled to hundreds or thousands.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
EECC722 - Shaaban #1 lec # 12 Fall Computer System Components SDRAM PC100/PC MHZ bits wide 2-way interleaved ~ 900 MBYTES/SEC.
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
Spring 2002EECS150 - Lec19-memory Page 1 EECS150 - Digital Design Lecture 18 - Memory April 4, 2002 John Wawrzynek.
EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
Memory Technology “Non-so-random” Access Technology:
University of Tehran 1 Microprocessor System Design Omid Fatemi Memory Interfacing
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
My great Computer TOMMY H. My Great Computer  Its main function of the is to play game, can show high equality picture  Can process the application.
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
Alpha 21364: A Scalable Single-chip SMP
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Barroso, Gharachorloo, McNamara, et. Al Proceedings of the 27 th Annual ISCA, June.
Memory and Storage Dr. Rebhi S. Baraka
CPEN Digital System Design
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
University of Tehran 1 Microprocessor System Design Memory Timing Omid Fatemi
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Luiz André Barroso, Kourosh Gharachorloo,
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Chapter Overview Microprocessors Replacing and Upgrading a CPU.
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
Overview High Performance Packet Processing Challenges
Interconnection network network interface and a case study.
CPU/BIOS/BUS CES Industries, Inc. Lesson 8.  Brain of the computer  It is a “Logical Child, that is brain dead”  It can only run programs, and follow.
The Alpha – Data Stream Matt Ziegler.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
© 2004 IBM Corporation Power Everywhere POWER5 Processor Update Mark Papermaster VP, Technology Development IBM Systems and Technology Group.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
COMPUTER SYSTEMS ARCHITECTURE A NETWORKING APPROACH CHAPTER 12 INTRODUCTION THE MEMORY HIERARCHY CS 147 Nathaniel Gilbert 1.
Submitted To: Submitted By: Seminar On 8086 Microprocessors.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Presented by: Nick Kirchem Feb 13, 2004
Lynn Choi School of Electrical Engineering
Modern Computer Architecture
RAM, CPUs, & BUSES Egle Cebelyte.
CS775: Computer Architecture
Directory-based Protocol
Interconnect with Cache Coherency Manager
Chapter 4: MEMORY.
Peter Bannon Staff Fellow HP
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Presentation transcript:

Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA

Outline Alpha Roadmap Alpha Roadmap Alpha Alpha 21364

Alpha Roadmap EV5/ EV6/ EV68/ PCA56/ PC EV56/  m 0.35  m EV67/  m PCA57/ PC 0.28  m 0.18  m Higher Performance Lower Cost... EV  m EV7/

Estimated time for TPC-C New core Higher MHz Higher integration

Alpha Features Alpha core with enhancements Alpha core with enhancements Integrated L2 Cache Integrated L2 Cache Integrated memory controller Integrated memory controller Integrated network interface Integrated network interface Support for lock-step operation to enable high-availability systems. Support for lock-step operation to enable high-availability systems.

Memory Controller RAMBUSRAMBUS Chip Block Diagram Core 16 L1 Miss Buffers L2 Cache Address Out Address In Network Interface N S E W I/O 16 L1 Victim Buf 16 L2 Victim Buf 64K Icache 64K Dcache

Integrated L2 Cache 1.5 MB 1.5 MB 6-way set associative 6-way set associative 16 GB/s total read/write bandwidth 16 GB/s total read/write bandwidth 16 Victim buffers for L1 -> L2 16 Victim buffers for L1 -> L2 16 Victim buffers for L2 -> Memory 16 Victim buffers for L2 -> Memory ECC SECDED code ECC SECDED code 12ns load to use latency 12ns load to use latency

Integrated Memory Controller Direct RAMbus Direct RAMbus High data capacity per pin High data capacity per pin 800 MHz operation 800 MHz operation 30ns CAS latency pin to pin 30ns CAS latency pin to pin 6 GB/sec read or write bandwidth 6 GB/sec read or write bandwidth 100s of open pages 100s of open pages Directory based cache coherence Directory based cache coherence ECC SECDED ECC SECDED

Integrated Network Interface Direct processor-to-processor interconnect Direct processor-to-processor interconnect 10 GB/second per processor 10 GB/second per processor 15ns processor-to-processor latency 15ns processor-to-processor latency Out-of-order network with adaptive routing Out-of-order network with adaptive routing Asynchronous clocking between processors Asynchronous clocking between processors 3 GB/second I/O interface per processor 3 GB/second I/O interface per processor

21364 System Block Diagram 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO 364 M IO

Alpha Technology 0.18  m CMOS 0.18  m CMOS MHz MHz volts volts 3.5 cm cm 2 6 Layer Metal 6 Layer Metal 100 million transistors 100 million transistors 8 million logic 8 million logic 92 million RAM 92 million RAM