1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
.1 Network Connected Multi’s [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiple Processor Systems
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
MultiIntro.1 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Parallel Architectures
Computer System Architectures Computer System Software
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.
.1 Intro to Multiprocessors [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Parallel Computer Architecture and Interconnect 1b.1.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
.1 Intro to Multiprocessors. .2 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Background Computer System Architectures Computer System Software.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Parallel Architecture
Introduction to parallel programming
Computer Engineering 2nd Semester
CMSC 611: Advanced Computer Architecture
Parallel and Multiprocessor Architectures – Shared Memory
Shared Memory Multiprocessors
Lecture 1: Parallel Architecture Intro
Multiprocessors - Flynn’s taxonomy (1966)
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory.
High Performance Computing
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Presentation transcript:

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform memory access (NUMA) Time for memory access depends on location of data; also known as Distributed Shared memory machines. Local access is faster than non-local access. Easier to scale than SMPs(example: SGI Origin 2000) Shared Memory: UMA and NUMA

2 SMPs: Memory Access Problems Conventional wisdom is that systems do not scale well –Bus based systems can become saturated –Fast large crossbars are expensive

3 SMPs: Memory Access Problems Cache Coherence : When a processor modifies a shared variable in local cache, different processors have different value of the variable. Several mechanism have been developed for handling the cache coherence problem Cache coherence problem –Copies of a variable can be present in multiple caches –A write by one processor may not become visible to others –They'll keep accessing stale value in their caches –Need to take actions to ensure visibility or cache coherence

4 Cache Coherence Problem Processors see different values for u after event 3 Processes accessing main memory may see very stale value Unacceptable to programs, and frequent!

5 Snooping-Based Coherence Basic idea: –Transactions on memory are visible to all processors –Processor or their representatives can snoop (monitor) bus and take action on relevant events Implementation –When a processor writes a value a signal is sent over the bus –Signal is either Write invalidate - tell others cached value is invalid and then update Write broadcast/update - tell others the new value continuously as it is updated

6 SMP Machines Various Suns such as the E10000/HPC10000 Various Intel and Alpha servers Crays: T90, J90, SV1

7 “Distributed-shared” Memory (cc-NUMA) Consists of N processors and a global address space –All processors can see all memory –Each processor has some amount of local memory –Access to the memory of other processors is slower –Cache coherence (‘cc’) must be maintained across the network as well as within nodes NonUniform Memory Access

8 cc-NUMA Memory Issues Easier to build with same number of processors because of slower access to remote memory (crossbars/buses can be less expensive than if all processors are in an SMP box) Possible to created much larger shared memory system than in SMP Similar cache problems as SMPs, but coherence must be also enforced across network now! Code writers should be aware of data distribution –Load balance –Minimize access of "far" memory

9 cc-NUMA Machines SGI Origin 2000 HP X2000, V2500

10 Distributed Memory Each of N processors has its own memory Memory is not shared Communication occurs using messages –Communication networks/interconnects Custom –Many manufacturers offer custom interconnects Off the shelf –Ethernet –ATM –HIPI –FIBER Channel –FDDI

11 Types of Distributed Memory Machine Interconnects Fully connected Array and torus –Paragon –T3E Crossbar –IBM SP –Sun HPC10000 Hypercube –Ncube Fat Trees –Meiko CS-2

12 Multi-tiered/Hybrid Machines Clustered SMP nodes (‘CLUMPS’): SMPs or even cc-NUMAs connected by an interconnect network Examples systems –All new IBM SP –Sun HPC10000s using multiple nodes –SGI Origin 2000s when linked together –HP Exemplar using multiple nodes –SGI SV1 using multiple nodes

13 Reasons for Each System SMPs (Shared memory machine): easy to build, easy to program, good price-performance for small numbers of processors; predictable performance due to UMA cc-NUMAs (Distributed Shared memory machines) : enables larger number of processors and shared memory address space than SMPs while still being easy to program, but harder and more expensive to build Distributed memory MPPs and clusters: easy to build and to scale to large numbers or processors, but hard to program and to achieve good performance Multi-tiered/hybrid/CLUMPS: combines bests (worsts?) of all worlds… but maximum scalability!