1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

SE-292 High Performance Computing
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Chapter 1 Parallel Computers.
Parallel Computers Chapter 1
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
COMPE 462 Parallel Computing
Information Technology Center Introduction to High Performance Computing at KFUPM.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Introduction CS 524 – High-Performance Computing.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
GCSE Computing - The CPU
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Parallel Architectures
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Distributed Shared Memory Systems and Programming
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your.
并行程序设计 Programming for parallel computing 张少强 QQ: ( 第一讲: 2011 年 9 月.
Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.
Parallel Computer Architecture and Interconnect 1b.1.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura.
Parallel Computers Chapter 1.
Chapter 4: Multithreaded Programming
Course Outline Introduction in algorithms and applications
Constructing a system with multiple computers or processors
What is Parallel and Distributed computing?
Chapter 4: Threads.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 1 Introduction.
Programming with Shared Memory
Chapter 4: Threads & Concurrency
Types of Parallel Computers
Presentation transcript:

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013

1b.2 Shared Memory Multiprocessor

1b.3 Conventional Computer Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2 b - 1 when there are b bits (binary digits) in address. Main memory Processor Instructions (to processor) Data (to or from processor)

1b.4 Shared Memory Multiprocessor System Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module: Processors Processor-memory Interconnections Memory module One address space

1b.5 Simplistic view of a small shared memory multiprocessor Examples: Dual Pentiums Quad Pentiums ProcessorsShared memory Bus

1b.6 Real computer system have cache memory between main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache. Example Quad Shared Memory Multiprocessor Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Memory controller Memory Processor/ memory bus Shared memory

1b.7 “Recent” innovation (since 2005) Dual-core and multi-core processors Two or more independent processors in one package Actually an old idea but not put into wide practice until recently with the limits of making single processors faster principally caused by: –Power dissipation (power wall) and clock frequency limitations –Limits in parallelism within a single instruction stream –Memory speed limitations (memory wall)

1b.8 “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software” Herb Sutter, Power dissipation Clock frequency

1b.9 Single “quad core” shared memory multiprocessor L2 Cache Memory controller Memory Shared memory Chip Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache

1b.10 Multiple quad-core multiprocessors (example coit-grid05.uncc.edu) Memory controller Memory Shared memory L2 Cache possible L3 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache

1b.11 Programming Shared Memory Multiprocessors Several possible ways – we will concentrate upon using threads Threads - individual parallel sequences (threads), each thread having their own local variables but being able to access shared variables declared outside threads. 1.Low–level thread libraries - programmer calls thread routines to create and control the threads. Example Pthreads, Java threads. 2.Higher level library functions and preprocessor compiler directives. Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables

1b.12 Tasks – rather than program with threads, which are closely linked to the physical hardware, can program with parallel “tasks”. Promoted by Intel with their TBB (Thread Building Blocks) tools. Other alternatives include parallelizing compilers compiling regular sequential programs and making them parallel programs, and special parallel languages (both not now common).

GPU clusters Recent trend for clusters – incorporating GPUs for high performance. GPU often attached through PCI-e x16 interface to CPU, and separate GPU memory. Now 1000’s cores in each GPU offering orders of magnitude speed improvement for HPC tasks. 10,000’s of threads possible (Data parallel programming model, see later) 1b.13

1b.14 Message-Passing Multicomputer

1b.15 Message-Passing Multicomputer Complete computers connected through an interconnection network: Processor Interconnection network Local Computers Messages memory Many interconnection networks explored in the 1970s and 1980s including 2- and 3- dimensional meshes, hypercubes, and multistage interconnection networks

1b.16 Networked Computers as a Computing Platform Became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s. Several early projects. Notable: – Berkeley NOW (network of workstations) project. –NASA Beowulf project.

1b.17 Key advantages: Very high performance workstations and PCs readily available at low cost. The latest processors can easily be incorporated into the system as they become available. Existing software can be used or modified.

1b.18 Beowulf Clusters A group of interconnected “commodity” computers achieving high performance with low cost. Typically using commodity interconnects - high speed Ethernet, and Linux OS. Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

1b.19 Cluster Interconnects Originally fast Ethernet on low cost clusters Gigabit Ethernet - easy upgrade path More specialized/higher performance interconnects available including Myrinet and Infiniband.

1b.20 Dedicated cluster with a master node and compute nodes User Master node Compute nodes Dedicated Cluster Ethernet interface Switch External network Computers Local network

1b.21 Software Tools for Clusters Based upon message passing programming model User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer. Use with regular programming languages (C, C++,...). Can be quite difficult to program correctly as we shall see.

Next step Learn the message passing programming model, some MPI routines, write a message-passing program and test on the cluster. 1b.22