Distributed and Parallel Processing

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

Today’s topics Single processors and the Memory Hierarchy

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Jie Liu, Ph.D. Professor Department of Computer Science

Parallel Architectures: Topologies Heiko Schröder, 2003.

Parallel Architectures: Topologies Heiko Schröder, 2003.

1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.

4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.

Introduction to Parallel Processing Ch. 12, Pg

Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,

Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.

Parallel Computing.

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

Data Structures and Algorithms in Parallel Computing Lecture 1.

2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.

Outline Why this subject? What is High Performance Computing?

Super computers Parallel Processing

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

Parallel Computing Presented by Justin Reschke

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Parallel Processing & Distributed Systems Thoai Nam Chapter 3.

Interconnection Networks Communications Among Processors.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

These slides are based on the book:

Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.

Parallel Architecture

Multiprocessor Systems

Distributed Processors

Connection System Serve on mutual connection processors and memory .

Course Outline Introduction in algorithms and applications

CS 147 – Parallel Processing

Overview Parallel Processing Pipelining

Parallel computation models

Data Structures and Algorithms in Parallel Computing

Parallel Architectures Based on Parallel Computing, M. J. Quinn

Different Architectures

Symmetric Multiprocessing (SMP)

Outline Interconnection networks Processor arrays Multiprocessors

Multiprocessors - Flynn’s taxonomy (1966)

AN INTRODUCTION ON PARALLEL PROCESSING

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Part 2: Parallel Models (I)

Chapter 2 from ``Introduction to Parallel Computing'',

Presentation transcript:

Distributed and Parallel Processing George Wells

Terminology (cont.)

Flynn’s Taxonomy Single Instruction stream, Single Data stream (SISD) = serial computer Single Instruction stream, Multiple Data stream (SIMD) = processor arrays/vector processors/GPU Multiple Instruction stream, Single Data stream (MISD)‏ Multiple Instruction stream, Multiple Data stream (MIMD) = multiprocessors

Terminology Middleware Connectivity software Functional set of APIs “the software layer that lies between the operating system and the applications on each side” Connectivity software Functional set of APIs

Terminology Data Access DBMS house and manage data access Allow disparate data sources to be viewed in a consistent way Database middleware – data passing

Terminology MOM - Message Oriented Middleware resides between applications and network infrastructure refers to process of distributing data and control through exchange of messages includes message passing and queueing models asynchronous and synchronous communications

Granularity The term grain is used to indicate the amount of computation performed between synchronisations: Coarse grain Fine grain

Communication : Computation Ratio Important performance characteristic when communication is explicit (e.g. message passing) Related to grain size

Hardware Models The RAM (Random Access Machine) model provides a useful abstraction We can reason about performance of algorithms, etc. Can we create a similar model for parallel systems?

PRAM Parallel Random Access Machine Multiple processing units connected to a shared memory unit Instructions executed in lock-step Simplifies synchronisation Multiple simultaneous accesses to one memory location Differing approaches: disallowed; must all write same value; one (randomly selected) succeeds; etc.

Problem PRAM does not adequately model memory behaviour Assumes all memory accesses take unit time Overhead of enforcing consistency grows with number of processors

CTA Candidate Type Architecture Distinguishes between local and non-local memory accesses Multiple processors connected by some form of “network”

Interconnection network CTA PC P0 P1 P2 Pm . . . Interconnection network Processor Memory NIC Network connections (1 <= n <= 6)

CTA Data references can be Local (unit cost) Non-local (λ, non-local memory latency – multiple of local cost ) Models for non-local access Shared memory High hardware cost, poor scalability 1-sided communication One processor “gets” and “puts” non-local data; requires synchronisation Message passing Explicit “send” and “receive” required

Processor Topologies Criteria to measure effectiveness in implementing parallel algorithms Diameter of network = largest distance between 2 nodes Bisection width = minimum no. of edges to be removed to split network in two No of edges per node Maximum edge length

Processor Topologies Ideal Organisation: Low diameter - lower bound on complexity for algs that require comms between arbitrary nodes. High bisection width - in algs with large amounts of data movement, size of data divided by bisection width puts lower bound on complexity. No of edges constant independent of network size - scalability Max edge length constant - scalability

Processor Topologies Mesh Pyramid Shuffle-Exchange Butterfly Hypercube / Cube-connected Cube-connected Cycles Others: Binary Tree; Hypertree; de Bruijn Network; minimum path

Simple 2-D Mesh

Wrap-around Mesh

Toroidal Wrap-around Mesh

Pyramid Attempt to combine advantages of mesh networks and tree networks A pyramid of size p is a 4-ary tree of height log4 p

Shuffle-Exchange Network Solid arrows = shuffle connections. Dashed arrows = exchange connections. Shuffle-exchange network: used for Discrete Fourier Transforms and sorting bitonic sequences Necklace of i = nodes which a data item (starting at position i) traverses in response to shuffles.

Butterfly

Hypercube

Cube Connected Cycles