Www.compaq.com Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Slides:



Advertisements
Similar presentations
MUHAMMAD AHMED HUSSAIN
Advertisements

Introduction to Storage Area Network (SAN) Jie Feng Winter 2001.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.
Multiple Processor Systems
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
Introduction to MIMD architectures
Background Computer System Architectures Computer System Software.
OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
G Robert Grimm New York University Disco.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Symmetric and CC-NUMA. Scope zDesign experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC- NUMA) zNUMA yNatural extension of SMP systems.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
Computer System Architectures Computer System Software
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
MBG 1 CIS501, Fall 99 Lecture 18: Input/Output (I/O): Buses and Peripherals Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
AlphaServer GS320 Architecture & Design Gharachorloo, Sharma, Steely, and Van Doren Compaq Research & High-Performance Servers Published in 2000 (ASPLOS-IX)‏
(1) SIMICS Overview. (2) SIMICS – A Full System Simulator Models disks, runs unaltered OSs etc. Accuracy is high (e.g., pollution effects factored in)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Lecture # 10 Processors Microcomputer Processors.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
The University of Adelaide, School of Computer Science
Ottawa Linux Symposium Christoph Lameter, Ph.D. Technical Lead Linux Kernel Software Silicon Graphics, Inc. Extreme High.
CS 704 Advanced Computer Architecture
Presented by: Nick Kirchem Feb 13, 2004
The University of Adelaide, School of Computer Science
CS775: Computer Architecture
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Parallel and Multiprocessor Architectures – Shared Memory
Lecture 24: Memory, VM, Multiproc
/ Computer Architecture and Design
High Performance Computing
The University of Adelaide, School of Computer Science
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Presentation transcript:

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation

Outline  Multi-Processor Hardware Issues  Snoopy Bus System Architecture  AMD Athlon’s Snoopy Protocol  ccNUMA System Architecture  AMD Athlon’s LDT System Bus  SGI Origion’s ccNUMA System Architecture  Alpha System Architecture  ccNUMA and CPU Scheduling  Conclusion

Multi-Processor Hardware Issues  Bandwidth/Latency Processor to Processor Processor to Processor Processor to Memory Processor to Memory Processor to I/O Processor to I/O  Scalability Increase performance as you increase CPU/Memory Increase performance as you increase CPU/Memory  Coherency/Synchronization Give software coherent view of memory Give software coherent view of memory Provide synchronization primitives Provide synchronization primitives

Snoopy Bus System Architecture

 A bus Connects Processors,Memory,and I/O  Scales upto ~16 processors  Limited by bus bandwidth  Cache Coherency Protocol Snoops the bus for memory traffic Snoops the bus for memory traffic Each set has to “listen” for addresses in it’s cache Each set has to “listen” for addresses in it’s cache Does the “right thing” to give software coherent view of memory Does the “right thing” to give software coherent view of memory

Snoopy Bus System Architecture CPU Core Cache Bus CPU Core Cache CPU Core Cache Memory I/O Memory I/O Memory I/O

ccNUMA System Architecture

 Cache-Coherent Non-Uniform Memory Access  Memory is distributed and attached to processors  Some network connects each processor/memory sets  Each processor owns part of the memory space  Cache coherency protocol Gives software coherent view of memory Gives software coherent view of memory Protocol primitives for synchronization Protocol primitives for synchronization Directory to keep track of who has a copy of memory Directory to keep track of who has a copy of memory

ccNUMA System Architecture CPU Core Cache Memory Directory I/O Network Router CPU Core Cache Network Router Network Fabric Memory Directory I/O

SGI Origin System Architecture

SGI CrayLink TM  Node = 2 CPU and their cache  Module = Memory + Directory + HUB  2 Modules per Router  System = Modules + Routers + CrayLink TM Network

SGI CrayLink TM

Processor System Network

Bisectional Bandwidth

ccNUMA and CPU Scheduling Issues

OS’s Questions  Single CPU System What to schedule next? What to schedule next?  ccNUMA System What to schedule next? What to schedule next? Which cpu to schedule it to? Which cpu to schedule it to? Where should the process information be located at? Where should the process information be located at? 1 or many instances of OS? 1 or many instances of OS?

OS’s Choices for a Process  Single CPU System Process has1 choice Process has1 choice Process information has 1 choice Process information has 1 choice  ccNUMA System with N CPU’s and M Memory Process has N choices Process has N choices Process information M choices per virtual page Process information M choices per virtual page “Distance” between process and it’s information “Distance” between process and it’s information

Context Switch Penalty  Single CPU System Saving/Restoring process state (PCB) Saving/Restoring process state (PCB) Scheduling routine Scheduling routine  ccNUMA System Saving/Restoring process state (PCB) Saving/Restoring process state (PCB) Scheduling routine Scheduling routine Moving process’s information Moving process’s information

Some Common Sense  Replicate parts of the OS across processors System calls will happen often System calls will happen often  Minimize process movement Cost of moving a process to another CPU is high Cost of moving a process to another CPU is high Less than swaping to disk, most of the time Less than swaping to disk, most of the time Higher than simple context switching Higher than simple context switching  But if you have to move a process Minimize the amount of information to move Minimize the amount of information to move Opportunity for a cache???? Opportunity for a cache????

Conclusion  Hardware Bandwidth and Latency for performance Bandwidth and Latency for performance Cache Coherency for correctness Cache Coherency for correctness  Operating System ccNUMA adds complexity in CPU scheduling ccNUMA adds complexity in CPU scheduling HW performance = Lower Context Switch Penalty => flexibility in scheduling choices for a process HW performance = Lower Context Switch Penalty => flexibility in scheduling choices for a process

References  Alpha  AMD  SGI  BenchMarks

Abbreviation Index  AMD - Advanced Micro Devices  SGI - Silicon Graphics Inc.  ECC - Error Correction Code  SECDED - Single Error Correct Double Error Detect  API - Alpha Processor Inc  AGP - Accelerated Graphics Port  DDR DRAM - Double Data Rate Dynamic RAM  LTD - Lightning Data Transport  PCI - Peripheral Component Interconnect  CMOS - Complementary Metal Oxide Semiconductor  CAS - Column Address Strobe  TPC-C -Transaction Processing Performance Council Benchmark  ccNUMA - Cache-Coherent Non-Uniform Memory Access  SMP - Symmetric Multi-Processing