Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Distributed Systems CS
Parallel computer architecture classification
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
Introduction to MIMD architectures
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Parallel Architectures
Computer System Architectures Computer System Software
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallel and Distributed Computing References Introduction to Parallel Computing, Second Edition Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
High Performance Computing How to use Recommended Books Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Parallel Computing.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
Outline Why this subject? What is High Performance Computing?
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
CS 52500, Parallel Computing Spring 2011 Alex Pothen Lectures: Tues, Thurs, 3:00—4:15 PM, BRNG 2275 Office Hours: Wed 3:00—4:00 PM; Thurs 4:30—5:30 PM;
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
buses, crossing switch, multistage network.
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Constructing a system with multiple computers or processors
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Different Architectures
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
buses, crossing switch, multistage network.
AN INTRODUCTION ON PARALLEL PROCESSING
Constructing a system with multiple computers or processors
Introduction, background, jargon
Types of Parallel Computers
Presentation transcript:

Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007

Edgar Gabriel Multi-core processors

Edgar Gabriel Parallel Computer

Edgar Gabriel Parallel Computer PC cluster in U height per node - Myrinet Network - Fast Ethernet Network (1U = 1.75” = 4.45 cm)

Edgar Gabriel Parallel Computer PC Cluster in U height - Infiniband Network - Gigabit Ethernet Network

Edgar Gabriel

Earth Simulator Target: Achievement of high-speed numerical simulations with processing speed of 1000 times higher than that of the most frequently used supercomputers in ( 640 nodes 8 processors/node 40 TFLOPS peak 10 TByte memory

Edgar Gabriel Top 500 List (

Edgar Gabriel Top 500 List

Edgar Gabriel Accompanying literature (I) Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar: “Introduction to Parallel Computing”, Pearson Education, John May: “Parallel I/O for High Performance Computing”, Morgan Kaufmann, 2000.

Edgar Gabriel Accompanying literature (II) Accompanying literature –Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill “Patterns for Parallel Programming”, Software Pattern Series, Addison Wessley, –Jack Dongarra, Ian Foster, Geoffrey Fox, William Gropp, Ken Kennedy, Linda Torczon, Andy White “Sourcebook of Parallel Computing”, Morgan Kaufmann Publishers, –Michael J. Quinn: “Parallel Programming in C with MPI and OpenMP”, McGrawHill, –L. Ridgeway Scott, Terry Clark, Babak Bagheri: ”Scientific Parallel Computing”, Princeton University Press, 2005.

Edgar Gabriel Flynn’s Taxonomy SISD: Single instruction single data –Classical von Neumann architecture SIMD: Single instruction multiple data MISD: Multiple instructions single data –Non existent, just listed for completeness MIMD: Multiple instructions multiple data –Most common and general parallel machine

Edgar Gabriel Single Instruction Multiple Data (I) Also known as Array-processors A single instruction stream is broadcasted to multiple processors, each having its own data stream Instructions stream processor Data Control unit

Edgar Gabriel Single Instruction Multiple Data (II) Interesting detail: handling of if-conditions –First all processors, for which the if-condition is true execute the according code-section, other processors are on hold –Second, all processors for the if-condition is not true execute the according code-section, other processors are on hold Some architectures in the early 90s used SIMD (MasPar, Thinking Machines) No SIMD machines available today SIMD concept used in processors of your graphics card

Edgar Gabriel Multiple Instructions Multiple Data (I) Each processor has its own instruction stream and input data Most general case – every other scenario can be mapped to MIMD Further breakdown of MIMD usually based on the memory organization –Shared memory systems –Distributed memory systems

Edgar Gabriel Shared memory systems (I) All processes have access to the same address space –E.g. PC with more than one processor Data exchange between processes by writing/reading shared variables –Shared memory systems are easy to program –Current standard in scientific programming: OpenMP Two versions of shared memory systems available today –Symmetric multiprocessors (SMP) –Non-uniform memory access (NUMA) architectures

Edgar Gabriel Symmetric multi-processors (SMPs) All processors share the same physical main memory Memory bandwidth per processor is limiting factor for this type of architecture Typical size: 2-16 processors Memory CPU

Edgar Gabriel NUMA architectures (I) Some memory is closer to a certain processor than other memory –The whole memory is still addressable from all processors –Depending on what data item a processor retrieves, the access time might vary strongly Memory CPU Memory CPU Memory CPU Memory CPU

Edgar Gabriel NUMA architectures (II) Reduces the memory bottleneck compared to SMPs More difficult to program efficiently –First touch policy: data item will be located in the memory of the processor which touches the data item first To reduce effects of non-uniform memory access, caches are often used –ccNUMA: cache-coherent non-uniform memory access architectures Largest example as of today: SGI Origin with 512 processors

Edgar Gabriel Distributed memory machines (I) Each processor has its own address space Communication between processes by explicit data exchange –Sockets –Message passing –Remote procedure call / remote method invocation CPU Memory Network interconnect CPU Memory CPU Memory CPU Memory

Edgar Gabriel Distributed memory machines (II) Performance of a distributed memory machine strongly depends on the quality of the network interconnect and the topology of the network interconnect –Of-the-shelf technology: e.g. fast-Ethernet, gigabit- Ethernet –Specialized interconnects: Myrinet, Infiniband, Quadrics, …

Edgar Gabriel Distributed memory machines (III) Two classes of distributed memory machines: –Massively parallel processing systems (MPPs) Tightly coupled environment Single system image (specialized OS) –Clusters Of-the-shelf hardware and software components such as –Intel P4, AMD Opteron etc. –Standard operating systems such as LINUX, Windows, BSD UNIX

Edgar Gabriel Hybrid systems E.g. clusters of multi-processor nodes CPU Memory Network interconnect CPU Memory CPU Memory CPU Memory CPU Memory CPU Memory