Symmetric and CC-NUMA. Scope zDesign experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC- NUMA) zNUMA yNatural extension of SMP systems.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.
Distributed Systems CS
Computer Organization and Architecture
GWDG Matrix Transpose Results with Hybrid OpenMP / MPI O. Haan Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen, Germany ( GWDG ) SCICOMP.
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Introduction to MIMD architectures
Background Computer System Architectures Computer System Software.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
G Robert Grimm New York University Disco.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Chapter 17 Parallel Processing.
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Chapter 18 Parallel Processing (Multiprocessing).
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
DDM - A Cache-Only Memory Architecture Erik Hagersten, Anders Landlin and Seif Haridi Presented by Narayanan Sundaram 03/31/2008 1CS258 - Parallel Computer.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer System Architectures Computer System Software
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Clusters. zAlternative to symmetric multiprocessing (SMP) zGroup of interconnected, whole computers working together as a unified computing resource yillusion.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to parallel programming
CMSC 611: Advanced Computer Architecture
Parallel and Multiprocessor Architectures – Shared Memory
Lecture 1: Parallel Architecture Intro
Multiprocessors - Flynn’s taxonomy (1966)
High Performance Computing
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Presentation transcript:

Symmetric and CC-NUMA

Scope zDesign experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC- NUMA) zNUMA yNatural extension of SMP systems

Architectures Processor & Cache Interconnect Processor & Cache... I/O Memory Processor & Cache Bus/Crossbar Processor & Cache... I/O Memory Shared Memory logic structureSMP architecture Processor & Cache Bus/Crossbar Processor & Cache... I/O MemoryRemote Cache... Node 1 Processor & Cache Bus/Crossbar Processor & Cache... I/O MemoryRemote Cache Node N

Advantages of shard memory systems (SMP or CC-NUMA) zSymmetry yAny processor can access any memory location and I/O device zSingle address space ySingle system image yOne copy of OS, database app, etc xReside in the shared memory xUser no control over data distribution, redistribution ySingle OS schedules processes xEasy workload management, dynamic load balancing

Advantages of shard memory systems (SMP or CC-NUMA) zCaching yData locality supported in the hierarchy zCoherency yEnforced by the hardware? xMESI-like snoopy protocol zMemory Communication yLow latency xSimple load/store instructions xHardware generates coherency information

Basic Issues that SMPs must address zAvailability yBiggest problem yFailure of the bus, memory, OS !! zBottleneck yCompete for the memory bus and shard memory xPacket switched-bus (split transactions) zLatency yLow latency but still large compared to CPU zMemory bandwidth vs. Processor speed vs. Memory capacity zScalability yA bus is not scalable

CC-NUMA zExtends SMPs by connecting several SMP nodes into a larger system zEmploy directory based cache coherent protocol zWhile maintaining the advantages, attacks the scalability problem

Distributed shared memory enhances: zScalability yMemory capacity, I/O capabilities increase by adding more nodes zBandwidth yAn app can access multiple local memories concurrently zAvailability yMultiple copies of a portion of OS can run on multiple nodes xFailure of one will not disrupt the entire system

Programming zWe said that y“data structures get distributed” y“Cache coherency then tracks the changes” zAny issues? (remote cache vs local memory) yP, Q: processes yA, B: arrays P:Q: Phase 1:use(A)use(B) Phase 2:use(B)use(A)