Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing.

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Parallel Processors.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Addition Facts
Year 6 mental test 5 second questions
ZMQS ZMQS
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Chapter 5 Computing Components. 2 Chapter Goals Read an ad for a computer and understand the jargon List the components and their function in a von Neumann.
Shredder GPU-Accelerated Incremental Storage and Computation
ABC Technology Project
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2012 National Heart Foundation of Australia. Slide 2.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
© DEEDS – OS Course WS11/12 Lecture 10 - Multiprocessing Support 1 Administrative Issues  Exam date candidates  CW 7 * Feb 14th (Tue): * Feb 16th.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
The University of Adelaide, School of Computer Science
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Today’s topics Single processors and the Memory Hierarchy
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Parallel Processing Group Members: PJ Kulick Jon Robb Brian Tobin.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Introduction to Parallel Processing Ch. 12, Pg
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Parallel Computing.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Lectures on Parallel and Distributed Computing
Multiprocessor Systems
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
Multicore and GPU Programming
Presentation transcript:

Dr. Wei Chen ( ), Professor Tennessee State University Lectures on Parallel and Distributed Computing

2 Lecture 1: Introduction to parallel computing Lecture 2: Parallel computational models Lecture 3: Parallel algorithm design and analysis Lecture 4: Distributed-memory programming with PVM/MPI Lecture 5: Shared-memory programming with Open MP Lecture 6: Shared-memory programming with GPU Lecture 7: Introduction to distributed systems Lecture 8: Synchronous network algorithms Lecture 9: Asynchronous shared-memory/network algorithms Outline

Reference: (1) Lecture 1 & 2 & 3: Joseph Jaja, Introduction to Parallel Algorithms, Addison Wesley, (2) Lecture 4 & 5 & 6: Peter S. Pacheco: An Introduction to Parallel Programming, Morgan Kaufmann Publishers, (3) Lecture 7 & 8: Nancy A. Lynch, Distributed Algorithms, Morgan Kaufmann Publishers,

4 Lecture1: Introduction to Parallel Computing

5 Problems with large computing complexity Computing hard problems (NP-complete problems) exponential computing time. Problems with large scale of input size quantum chemistry, statistic mechanics, relative physics, universal physics, fluid mechanics, biology, genetics engineering, … For example, it costs 1 hour using the current computer to simulate the procedure of 1 second reaction of protein molecule and water molecule. It costs Why Parallel Computing

6 Why parallel Computing Physical limitation of CPU computational power In past 50 years, CPU was speeded up double every 2.5 years. But, there are a physical limitation. Light speed is Therefore, the limitation of the number of CPU clocks is expected to be about 10GHz. To solve computing hard problems Parallel processing DNA computer Quantum computer …

7 What is parallel computing Using a number of processors to process one task Speeding up the processing by distributing it to the processors One problem

8 Classification of parallel computers Two kinds of classification Flanns Classification SISD (Single Instruction stream, Single Data stream) MISD (Multiple Instruction stream, Single Data stream) SIMD (Single Instruction stream, Multiple Data stream) MIMD (Multiple Instruction stream, Multiple Data stream) 2. Classification by memory status Share memory Distributed memory

9 Flynns classification(1) SISD (Single Instruction Single Data) computer Von Neumans one processor computer ControlProcessor Memory Instruction Stream Data Stream

10 Flynns classification (2) MISD (Multi Instructions Single Data) computer All processors share a common memory, have their own control devices and execute their own instructions on same data. Memory Control Processor Instruction Stream Data Stream Control Processor Instruction Stream Control Processor Instruction Stream

11 Flynns classification (3) SIMD (Single Instructions Multi Data) computer Processors execute the same instructions on different data Operations of processors are synchronized by global clock. Shared Memory or Inter- connection Nerwork Processor Data Stream Control Processor Instruction Stream Processor Data Stream Data Stream

12 Flynns classification (4) MIMD (Multi Instruction Multi Data) Computer Processors have their own control devices, and execute different instructions on different data. Operations of processors are executed asynchronously in most time. It is also called as distributed computing system. Shared Memory or Inter- connection Nerwork Processor Data Stream Processor Control Instruction Stream Processor Data Stream Data Stream Control Instruction Stream Control Instruction Stream

13 Classification by memory types (1) 1. Parallel computer with a shared common memory Communication based on shared memory For example, consider the case that processor i sends some data to processor j. First, processor i writes the data to the share memory, then processor j reads the data from the same address of the share memory. Shared Memory Processor

14 Classification by memory types (2) Features of parallel computers with shared common memory Programming is easy. Exclusive control is necessary for the access to the same memory cell. Realization is difficult when the number of processors is large. Reason) The number of processors connected to shared memory is limited by the physical factors such as the size and voltage of units, and the latency caused in memory accessing.

15 Classification by memory tapes (3) 2. Parallel computers with distributed memory Communication style is one to one based on interconnection network. For example, consider the case that processor i sends data to processor j. First, processor i issues a send command such as processor j sends xxx to process i, then processor j gets the data by a receiving command.

16 Classification by memory types (4) Features of parallel computers using distributed memory There are various architectures of interconnection networks (Generally, the degree of connectivity is not large.) Programming is difficult since comparing with shared common memory the communication style is one to one. It is easy to increase the number of processors.

17 Types of parallel computers with distributed memory Complete connection type Any two processors are connected Features Strong communication ability, but not practical (each processor has to be connected to many processors). Mash connection type Processors are connected as a two-dimension lattice. Features - Connected to few processors. Easily to increase the number of processors. –Large distance between processors: –Existence of processor communication bottleneck.

18 Hypercube connection type Processors connected as a hypercube (each processor has a binary number. (Processors are connected if only if one bit of their number are different. Features Small distance between processors: log n. Balanced communication load because of its symmetric structure. Easy to increase the number of processors. Types of parallel computers with distributed memory

19 Other connected type Tree connection type, butterfly connection type, bus connection type. Criterion for selecting an inter-connection network Small diameter (the largest distance between processors) for small communication delay. Symmetric structure for easily increasing the number of processors. The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors. The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors. Types of parallel computers with distributed memory

20 Real parallel processing system (1) Early days parallel computer (ILLIAC IV) Built in 1972 SIMD type with distributed memory, consisting of 64 processors Transformed mash connection type, equipped with common data bus, common control bus, and one control Unit. Transformed mash connection type, equipped with common data bus, common control bus, and one control Unit.

21 Real parallel processing system (2) Parallel computers in recent 1990s Shared common memory type Workstation, SGI Origin2000 and other with 2-8 processors. Distributed memory type NameMaker Processor num Processing typeNetwork type CM-2TM65536SIMDhypercube CM-5TM1024MIMDfat tree nCUBE2NCUBE8192MIMDhypercube iWarpCMU, Intel64MIMD2D torus ParagonIntel4096MIMD2D torus SP-2IBM512MIMDHP switch AP1000Fujitsu1024MIMD2D torus SR2201Hitachi1024MIMDcrossbar

22 Real parallel processing system (3) Deep blue Developed by IBM for chess game only Defeating the chess champion Based on general parallel computer SP-2 memory RS/6000 Bus Interface Microchannel Bus Deep blue chip Deep blue chip Deep blue chip 8 RS/6000 node memory RS/6000 node RS/6000 node 32 nodes Inter-connection network (Generalized hypercube) memory

23 K ( )Computer - Fujitsu Architecture: 88,128 SPARC64 VIIIfx 2.0 GHz 8 cores processors, 864 cabinets of each with 96 computing nodes and 6 I/O nodes, 6 dimension Tofu/torus interconnect, Linux-based enhanced operating system, open-source Open MPI libaray,12.6 MW, petaflops, ranked as # 3 in 2011.

24 Architecture: 18,688 AMD Opteron cores CPUs, Cray Linux, 8.2 MW, petaflops, GPU based, Torus topology, ranked #1 in Titan –Oak Ridge Lab

– Architecture: 14,336 Xeon X5670 processors, 7168 Nvidia Tesla M2050 GPUs, 2048 FeiTeng 1000 SPARC-based processors, 4.7 petaflops. 112 computer cabinets, 8 I/O cabinets, 11D hypercube topology with IB QDR/DDR Linux, #2 in

Exercises Compare top three supercomputers in the world. 26