Distributed and Parallel Processing George Wells.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Distributed Systems CS
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Potential for parallel computers/parallel programming
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel System Performance CS 524 – High-Performance Computing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
CS220 Software Development Lecture: Multi-threading A. O’Riordan, 2009.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.
CS 284a, 4 November 1997 Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 4 November, 1997.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel System Performance CS 524 – High-Performance Computing.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Quick overview of threads in Java Babak Esfandiari (extracted from Qusay Mahmoud’s slides)
Multithreading in Java Project of COCS 513 By Wei Li December, 2000.
Complexity of Algorithms
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Compiled by Maria Ramila Jimenez
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Multithreading in Java Sameer Singh Chauhan Lecturer, I. T. Dept., SVIT, Vasad.
Parallel Programming with MPI and OpenMP
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Concurrency and Performance Based on slides by Henri Casanova.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Concurrency Idea. 2 Concurrency idea Challenge –Print primes from 1 to Given –Ten-processor multiprocessor –One thread per processor Goal –Get ten-fold.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Jonas Johansson Summarizing presentation of Scheduler Activations – A different approach to parallelism.
Distributed and Parallel Processing George Wells.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Advanced Topics in Concurrency and Reactive Programming: Asynchronous Programming Majeed Kassis.
What Exactly is Parallel Processing?
Introduction to Parallelism.
CSCE 212 Chapter 4: Assessing and Understanding Performance
EE 193: Parallel Computing
Complexity Measures for Parallel Computation
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
COMP60621 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Chapter 4: Threads.
Mattan Erez The University of Texas at Austin
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Complexity Measures for Parallel Computation
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
COMP60611 Fundamentals of Parallel and Distributed Systems
CMSC 202 Threads.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Distributed and Parallel Processing George Wells

Performance Analysis Sources of performance loss – Overheads – not present in sequential program – Sequential code – Idle processors (load balancing) – Contention for resources

Parallel Computing Terminology The worst-case time complexity of a parallel algorithm is a function f(n) that is the maximum over all inputs of size n. The cost of a parallel algorithm is defined as its complexity times p, the number of processors.

Parallel Computing Terminology The speedup achieved by a parallel algorithm running on p processors is the ratio between the time taken on the parallel computer executing the fastest sequential algorithm and executing the parallel algorithm with p processors The efficiency of a parallel algorithm running p processors is speedup / p

Speedup Linear speedup: s = p – Efficiency = 100% Superlinear speedup: s > p – Unusual (but wonderful!) Usually: – s < p – Efficiency < 100%

Amdahl’s Law If f is the inherently sequential fraction of a computation to be solved by p processors, then the speedup s is limited according to the formula:

The ≤ symbol represents both sequential overhead, and non-ideal load balancing. f = fraction of the overall computation that is inherently sequential Part that can be done in parallel

Assume f = 0 i.e. you can’t get better than linear speedup (according to Amdahl)

This time try substituting for f = 0.1, and p = ∞ The sequential component is a limiting factor

Suppose that a parallel computer E has been built by using the herd-of-elephants approach, where each processor in the computer is capable of performing sequential operations at the rate of X megaflops. Suppose that another parallel computer A has been built by using the army-of-ants approach; each processor in this computer is capable of performing sequential operations at the rate of β X megaflops, for some β where 0 < β << 1. If parallel computer A attempts a computation whose inherently sequential fraction f is greater than β, then A will execute the algorithm more slowly than a single processor of parallel computer E. Why? Because the time taken to execute the sequential part of A is greater than the time taken to execute the entire algorithm on E.

If a parallel computer built by using the army-of- ants approach is to be successful, one of the following conditions must be true: – at least one processor must be capable of extremely fast sequential computations, or – the problem being solved must admit a solution containing virtually no sequential components. Computers capable of high processing speeds must have high memory bandwidth and high I/O rates too.

Multithreading

Multithreading Example: Mandelbrot Set A fractal calculation – embarrassingly parallel The value of each point in a 2D space can be calculated independently of any other – Values are false-coloured to give graphical representation

Mandelbrot Set

Sequential Code A little complicated by graphics event handling in Java

Parallel Code Can create threads to perform the calculations Threads in Java: – Implement Runnable interface Specifies one method: public void run (); – Create Thread object – Call start method In turn calls the run method

Creating Threads in Java Implement the Runnable interface Can also extend the Thread class – Override the run() method First approach is preferred – Single-inheritance limitation in Java – Better OO style is-a relationship implied by inheritance

Java Threads We can think of a thread as having three parts: – A virtual CPU (provides execution ability) – Code – Data

Mandelbrot Application WorkerThread w = new WorkerThread(i, i+taskSize); Thread t = new Thread(w); t.start(); class WorkerThreadw Thread t

Threads in Java Simplified lifecycle New Runnable Running Dea d Blocked Blocking event Unblocked Scheduler run() completes start() Thread scheduling is preemptive, but not necessarily time-sliced – Use Thread.sleep(x) or Thread.yield()