12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Slides:

Advertisements

Similar presentations

IT253: Computer Organization

Advertisements

Computer Architecture

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.

Parallel and Distributed Simulation Time Warp: Basic Algorithm.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Parallel Programming Motivation and terminology – from ACM/IEEE 2013 curricula.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Parallel System Performance CS 524 – High-Performance Computing.

Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.

Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.

Reference: Message Passing Fundamentals.

1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.

Parallel System Performance CS 524 – High-Performance Computing.

Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

CS 240A: Complexity Measures for Parallel Computation.

Advances in Language Design

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.

Parallelism Processing more than one instruction at a time. Pipelining

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Parallel and Distributed Simulation Hardware Platforms Simulation Fundamentals.

“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Parallel execution Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.

Lab 2 Parallel processing using NIOS II processors

Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.

Pipelining and Parallelism Mark Staveley

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

CSCI-455/552 Introduction to High Performance Computing Lecture 6.

Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.

Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

1 Parallel execution Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.

1/46 PARALLEL SOFTWARE ( SECTION 2.4). 2/46 The burden is on software From now on… In shared memory programs: Start a single process and fork threads.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu

Chapter One Introduction to Pipelined Processors.

Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

PERFORMANCE EVALUATIONS

The University of Adelaide, School of Computer Science

Introduction to Parallelism.

AN INTRODUCTION ON PARALLEL PROCESSING

Parallel and Distributed Simulation

COMP60621 Fundamentals of Parallel and Distributed Systems

The University of Adelaide, School of Computer Science

- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts: multiprogramming, multiprocessing, multitasking,

COMP60611 Fundamentals of Parallel and Distributed Systems

The University of Adelaide, School of Computer Science

Performance Measurement and Analysis

Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.

Presentation transcript:

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12a.2 Basic Principles

12a.3 Parallel Programming Parallel Computation is defined as splitting the tasks a program in a way that they are executed by multiple processors The basic idea is that n processors can do the work of 1 processor in 1/nth the amount of time (this is called speedup)‏ Data dependency reduces the possibility for speedup

12a.4 Data Dependency Data dependency occurs when one processor (P1) computes a value required by another processor (P2)‏ Suppose the following two statements are executed by two separate processors: x = f(y, z); area = x*x*PI;

12a.5 Data Dependency (continued)‏ The second processor (P2) needs the value computed by the first processor (P1) in order to correctly compute the area This requires that P2 –wait for P1 to compute the value of x (synchronization) and –have access to that value (communication)‏

12a.6 Data Dependency (continued)‏ The four types of dependencies: –True: P1 modifies a variable needed by P2 –Anti: P1 modifies a variable after P2 needs the old value –Output: Both P1 and P2 modify a variable –(Input: Both P1 and P2 read a variable)‏

12a.7 Types of Parallel Computers Shared-memory –All processors have access to all memory... Interconnection Network Processors... Memory Modules

12a.8 Types of Parallel Computers (continued)‏ Distributed- memory: –All processors have their own memory –Communication is done through message-passing (which is added latency)‏... Interconnection Network Processors... Memory Modules

12a.9 Types of Parallel Computers (continued)‏ Distributed- Shared memory: –All processors have their own memory but the memory is virtually shared... Interconnection Network Processors... Memory Modules Virtual Memory

12a.10 Types of Parallel Computers (continued)‏ When P1 computes a value needed by P2: –On a shared-memory system, P2 already has access that the value computed, but it still needs to wait for P1 to finish the computation (synchronization)‏ –On a distributed-memory system, P1 needs to send a message to P2 containing the required value (synchronization and communication)‏

12a.11 Types of Parallel Computers (continued)‏ When P1 computes a value needed by P2: –On a distributed-shared memory system, the programmer only needs to be concerned with synchronization –However, when P2 accesses P1's memory, a message must be sent implicitly containing the contents of P1's memory

12a.12 Parallel Execution Metrics Granularity – A measure of how big or small the individual units of computation are. With message-passing, large granularity is important to reduce the time spent sending messages: computation time communication time = t comp t comm

12a.13 Fastest known execution time on 1 processor Execution time on n processors Parallel Execution Metrics (continued)‏ Speedup – A measure of relative performance between a parallel program and a sequential program: = ts*tpts*tp = S(n)‏ Execution time on 1 processor Execution time on n processors = tstptstp =S(n)‏ Why should speedup be no better than linear (i.e. S(n) <= n)?

12a.14 Suppose S(n) > n then => t s * /t p > n => t s * > n*t p If we implement the parallel solution on 1 processor (where the 1 processor does the work of all the others), it would take n*t p to complete the work. However this would be a sequential program that is faster than the fastest sequential program! Parallel Execution Metrics (continued)‏

12a.15 t s t p * n Parallel Execution Metrics (continued)‏ Efficiency – A measure of how well the processors are being used: An Efficiency of 100% means that the processors are all busy; 50% means that they are used half of the time on average Execution time on 1 processor Execution time on n processors * n ==E(n)‏ S(n)‏ n =

12a.16 t p * n Parallel Execution Metrics (continued)‏ Cost – A measure of how much total CPU time is used (or wasted) to perform the computation: = Cost t s E(n)‏ =

12a.17 Example of Metrics Suppose we have a problem where the sequential execution time is 25 units and the parallel times are shown in the table below:

12a.18 Example of Metrics (continued)‏

12a.19 Example of Metrics (continued)‏