PARALLEL COMPUTING Submitted By : P. Nagalakshmi

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

SE-292 High Performance Computing

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Reference: Message Passing Fundamentals.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

CS 470/570:Introduction to Parallel and Distributed Computing.

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Computer System Architectures Computer System Software

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Parallel Computer Architecture and Interconnect 1b.1.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note Introduction to Parallel Computing (Blaise Barney,

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

The Truth About Parallel Computing: Fantasy versus Reality William M. Jones, PhD Computer Science Department Coastal Carolina University.

PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.

Parallel Computing.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.

Outline Why this subject? What is High Performance Computing?

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Parallel Computing Presented by Justin Reschke

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Programming pt.1

Introduction to Parallel Processing

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

4- Performance Analysis of Parallel Programs

Introduction to Parallel Computing

PARALLEL COMPUTING.

Introduction to parallel programming

Distributed Processors

What is a computer? Simply put, a computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information.

Parallel Processing - introduction

Parallel Programming By J. H. Wang May 2, 2017.

CS 147 – Parallel Processing

The University of Adelaide, School of Computer Science

Parallel Algorithm Design

Introduction to Parallelism.

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

Overview Parallel Processing Pipelining

Morgan Kaufmann Publishers

EE 193: Parallel Computing

CMSC 611: Advanced Computer Architecture

Parallel and Multiprocessor Architectures – Shared Memory

Lecture 1: Parallel Architecture Intro

Chapter 17 Parallel Processing

Lecture 3 : Performance of Parallel Programs

CSE8380 Parallel and Distributed Processing Presentation

Overview Parallel Processing Pipelining

AN INTRODUCTION ON PARALLEL PROCESSING

By Brandon, Ben, and Lee Parallel Computing.

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Chapter 4 Multiprocessors

Mattan Erez The University of Texas at Austin

Chapter 01: Introduction

Module 6: Introduction to Parallel Computing

CSL718 : Multiprocessors 13th April, 2006 Introduction

Presentation transcript:

PARALLEL COMPUTING Submitted By : P. Nagalakshmi Submitted To: Dr. R. K. Rathy HOD- CSE

Out line of the Seminar Parallel Computing Architecture Taxonomy Parallel Computer Memory Architectures Parallel Programming Models Steps for Creating a Parallel Program Design and Performance Considerations Examples

Overview of Parallel Computing Parallel computing is the simultaneous execution of the same task on multiple processors in order to obtain faster results. In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. The compute resources can include: A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both

Parallel Computing A problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs

Why Use Parallel Computing? Save time Solve larger problems Provide concurrency (do multiple things at the same time) It is expensive making single processor faster

Von Neumann Architecture Memory is used to store both program and data instructions Program instructions are coded data which tell the computer to do something Data is simply information to be used by the program A central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and then sequentially performs them.

Architecture Taxonomy A serial (non-parallel) computer Single instruction Single data A type of parallel computer Single instruction Multiple data Architecture Taxonomy A single data stream is fed into multiple processing units. Each processing unit operates on the data independently S I S D Single Instruction, Single Data S I M D Single Instruction, Multiple Data M I S D Multiple Instruction, Single Data M I M D Multiple Instruction, Multiple Data The most common type of parallel computer Multiple Instruction Multiple Data

Memory Architectures Shared Memory Distributed Memory Hybrid Distributed-Shared Memory

The ability for all processors to access all memory as global address space. Multiple processors can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA Shared Memory

Distributed Memory Require a communication network to connect inter-processor memory There is no concept of global address space across all processors. Processors operates independently. Hence, the concept of cache coherency does not apply.

Hybrid Distributed-Shared Memory The largest and fastest computers in the world today employ both shared and distributed memory architectures.

Parallel Programming Models Threads Message Passing Data Parallel Other Models

Threads Model A single process can have multiple, concurrent execution paths.

Message Passing Model Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

Data Parallel Model Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure, such as an array or cube. A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure.

Other Models Hybrid SPMD Combination of two or more parallel programming model. SPMD High level programming model. Single program can be executed by all tasks. All task may use different data.

Other Models MPMD High level programming model. Each task can execute same or different program. All task may use different data.

Steps for Creating a Parallel Program If you are starting with an existing serial program, debug the serial code completely Identify the parts of the program that can be executed concurrently Decompose the program Code development Compile, Test, Debug Optimization

Design and Performance Considerations Amdahl's Law Amdahl's Law states that potential program speedup is defined by the fraction of code (P) which can be parallelized speedup = 1 / 1 - P 1 speedup = ------------ P + S --- N where P = parallel fraction, N = number of processors and S = serial fraction.

Design and Performance Considerations (contd) Load Balancing Load balancing refers to the distribution of tasks in such a way as to insure the most time efficient parallel execution If tasks are not distributed in a balanced way, you may end up waiting for one task to complete while other tasks are idle Performance can be increased if work can be more evenly distributed

Design and Performance Considerations (contd) Granularity The ratio between computation and communication is known as granularity. Fine-grain parallelism All tasks execute a small number of instructions between communication cycles. Low computation to communication ratio. Facilitates load balancing. If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation.

Design and Performance Considerations (contd) Coarse-grain parallelism Typified by long computations consisting of large numbers of instructions between communication synchronization points High computation to communication ratio Implies more opportunity for performance increase Harder to load balance efficiently

Design and Performance Considerations (contd) Data Dependency A data dependency exists when there is multiple use of the same storage location Example 1: DO 500 J = MYSTART,MYEND A(J) = A(J-1) * 2.0 500 CONTINUE This code has a data dependency. Must have computed value for A(J-1) before we can calculate A(J). If Task 2 has A(J) and Task 1 has A(J-1), the value of A(J) is dependent on

Design and Performance Considerations (contd) Deadlock A condition where two or more processes are waiting for an event or communication from one of the other processes. TASK1 TASK2 ------------------ ------------------ X = 4 Y = 8 SOURCE = TASK2 SOURCE = TASK1 RECEIVE (SOURCE,Y) RECEIVE (SOURCE,X) DEST = TASK2 DEST = TASK1 SEND (DEST, X) SEND (DEST, Y) Z = X + Y Z = X + Y

Design and Performance Considerations (contd) Debugging Debugging parallel programs is significantly more of a challenge than debugging serial programs Parallel debuggers are beginning to become available, but much work remains to be done Use a modular approach to program development Pay as close attention to communication details as to computation details

Design and Performance Considerations (contd) Performance Monitoring and Analysis As with debugging, monitoring and analyzing parallel program execution is significantly more of a challenge than for serial programs A number of parallel tools for execution monitoring and program analysis are available Some are quite useful; some are cross-platform also Work remains to be done, particularly in the area of scalability.

Places where Parallel computing used Weather and ocean patterns Automobile assembly line Daily operations within a business Building a shopping mall Image and Signal Processing Entertainment (Image Rendering) Database and Data Mining

References Tutorials located in the Maui High Performance Computing Center's "SP Parallel Programming Workshop". Tutorials located at the Cornell Theory Center's "Education and Training" web page. "Designing and Building Parallel Programs". Ian FosterCarriero, Nicholas and Gelernter, David, "How to Write Parallel Programs - A First Course". MIT Press, Cambridge, Massachusetts. Dowd, Kevin, High Performance Computing", O'Reilly & Associated, Inc., Sebastopol, California. Hockney, R.W. and Jesshope, C.R., "Parallel Computers 2",Hilger, Bristol and Philadelphia. Ragsdale, Susan, ed., "Parallel Programming", McGraw-Hill, Inc., New York. Chandy, K. Mani and Taylor, Stephen, "An Introduction to Parallel Programming", Jones and Bartlett, Boston. www.wikipedia.org

THANK YOU