Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Background Computer System Architectures Computer System Software.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Windows 2000 and Solaris: Threads and SMP Management Submitted by: Rahul Bhuman.
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Computer System Architectures Computer System Software
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Lecture 13: Multiprocessors Kai Bu
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 13 Threads Read Ch 5.1.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Outline Why this subject? What is High Performance Computing?
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to parallel programming
Distributed Processors
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
12.4 Memory Organization in Multiprocessor Systems
CMSC 611: Advanced Computer Architecture
Shared Memory Multiprocessors
Hybrid Programming with OpenMP and MPI
Chapter 2: Operating-System Structures
High Performance Computing
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Chapter 2: Operating-System Structures
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Outline Shared Memory Architecture –SMP Architectures (NUMA, ccNUMA) –Cache & Cache Coherency Protocols Snoopy Directory Based –What is Thread? –What is Process? –Thread vs. Process OpenMP vs MPI

Shared Memory Architecture (SMP) CPUs access shared memory through a bus All processors share a single view of data and the communication between processors can be as fast as memory accesses to a same location CPU-to-memory connection becomes bottleneck (req. high speed interconnects !!!) Distributed Memory Shared Memory P P P P P P P P BUS (Network Bus) Memory Network M M P P NIC M M P P M M P P M M P P

UMA (Uniform Memory Access): individual processors share memory (and I/O) in such a way that each of them can access any memory location with the same speed –Many small shared machines are symmetric –Larger shared memory machines do not satisfy this definition (NUMA or cc-NUMA) Shared Memory Architecture NUMA (Non Uniform Memory Access) architecture was designed to overcome the scalability limits of the SMP (Shared Memory Processor / Symmetric Multiprocessor) architecture. Distributed Shared Memory BUS (Network Bus) Memory P P P P BUS (Network Bus) Memory P P P P BUS (Network Bus)

 Login to your UYBHM node using ssh :  Run cpuinfo command bash: $ ssh bash: $ cpuinfo Architecture : x86_64 Hyperthreading: disabled Packages : 2 Cores : 4 Processors : 4 ===== Processor identification ===== ProcessorThreadCorePackage

 Run cpuinfo command bash: $ cpuinfo Architecture : x86_64 Hyperthreading: disabled Packages : 2 Cores : 4 Processors : 4 ===== Processor identification ===== ProcessorThreadCorePackage ===== Processor placement ===== PackageCoresProcessors 0 0,10,2 3 0,11,3 ===== Cache sharing ===== CacheSizeProcessors L132 KBno sharing L24 MB(0,2)(1,3)

What is cache? –Extremely fast and relatively small memory unit L1 Cache: built into cpu itself L2 Cache: resides on a separate chip next to the CPU –CPU does not use motherboard system bus to data transfer –Reduce memory access time –Decrease bandwidth requirement of local memory module and global interconnect Shared Memory Architecture Register Disk Memory L1 Cache L2CacheL2Cache L2CacheL2Cache CPU

NUMA Architecture Types: –ccNUMA means cache coherent NUMA architecture. –Cache coherence is integrity of data stored in local caches of a shared resource. Shared Memory Architecture

Coherence defines the behavior of reads and writes to the same memory location. –If each processor has a cache that reflects the state of various parts of memory, it is possible that two or more caches may have copies of the same line. –If two threads make appropriately serialized changes to those data items, the result could be that both caches end up with different, incorrect versions of the line of memory. –The system's state is no longer coherent !!! Shared Memory Architecture

Cache Coherence Solution: Directory-Based protocol or Snooping protocol (Invalidate or Update techniques) Memory CPU A BUS Memory Cache CPU B Cache 7 7 (a) (b)(c)(d) CPU A BUS Memory CPU B 7 7 CPU A BUS Memory CPU B CPU A BUS CPU B

Solution: Cache Coherence Protocols !!! Protocols takes two kind of action when a cache line (L) is written –Invalidate all copies of L from the other cache of the machine –They may update those lines with the new value being written Most modern cache coherent multiprocessors use invalidation technique rather than update technique since it is easier to implement in hardware Shared Memory Architecture

Process –It is the "heaviest" unit of kernel scheduling. –It is unit of allocation –Processes execute independently. Interact with each other via interprocess communication mechanisms –Processes have own resources allocated by the operating system. Resources include memory (address space) and state information –Own register set (temporary memory cell) Main Definitions

Thread –It is the "lightest" unit of kernel scheduling. –It is unit of execution –At least one thread exists within each process. If multiple threads can exist within a process, then they share the same memory and file resources. –Share address space, register set, process stack –Threads do not own resources Main Definitions An execution entity having a serial flow of control, a set of private variables, and access to shared variables. OpenMP Review Board

Process vs. Thread It is a flow of control within a process. It is a basic unit of CPU utilization. It comprises of a thread ID, a program counter, a register set and a stack. If the two threads belong to the same process, they share its code section, data section and other operating system resource. A traditional process has a single thread of control. If the process has multiple threads of control, it can do more than one task at a time. Process Thread

OpenMP vs. MPI Pros of OpenMP considered by some to be easier to program and debug (compared to MPI) data layout and decomposition is handled automatically by directives. allows incremental parallelism: directives can be added incrementally, so the program can be parallelized one portion after another and thus no dramatic change to code is needed. unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used. original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs and helps maintenance as well. both coarse-grained and fine-grained parallelism are possible

OpenMP vs. MPI Cons of OpenMP currently only runs efficiently in shared-memory multiprocessor platforms requires a compiler that supports OpenMP. scalability is limited by memory architecture. reliable error handling is missing. lacks fine-grained mechanisms to control thread-processor mapping. synchronization between subsets of threads is not allowed. mostly used for loop parallelization can be difficult to debug, due to implicit communication between threads via shared variables.

OpenMP vs. MPI Pros of MPI does not require shared memory architectures which are more expensive than distributed memory architectures can be used on a wider range of problems since it exploits both task parallelism and data parallelism can run on both shared memory and distributed memory architectures highly portable with specific optimization for the implementation on most hardware

OpenMP vs. MPI Cons of MPI requires more programming changes to go from serial to parallel version can be harder to debug

OpenMP vs. MPI Different MPI and OpenMP applications for matrix multiplication

MPI vs OpenMP Programing Message-Passing ParallelismShared-Memory Parallelism