“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Lecture 10: Performance Metrics Shantanu Dutt ECE Dept. UIC.
Potential for parallel computers/parallel programming
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Parallel System Performance CS 524 – High-Performance Computing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.
Recap.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
Chapter 7 Performance Analysis. 2 References (Primary Reference): Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel Programming Chapter 3 Introduction to Parallel Architectures Johnnie Baker January 26 , 2011.
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
Parallel System Performance CS 524 – High-Performance Computing.
Design of parallel algorithms Matrix operations J. Porras.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Parallel Programming in C with MPI and OpenMP
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Performance Evaluation of Parallel Processing. Why Performance?
Chapter 7 Performance Analysis.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Compiled by Maria Ramila Jimenez
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Parallel Programming with MPI and OpenMP
1 Example 1 Evaluate Solution Since the degree 2 of the numerator equals the degree of the denominator, we must begin with a long division: Thus Observe.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
1 Example 3 Evaluate Solution Since the degree 5 of the numerator is greater than the degree 4 of the denominator, we begin with long division: Hence The.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Distributed and Parallel Processing George Wells.
Supercomputing in Plain English Tuning Blue Waters Undergraduate Petascale Education Program May 29 – June
CS203 – Advanced Computer Architecture
Potential for parallel computers/parallel programming
PERFORMANCE EVALUATIONS
4- Performance Analysis of Parallel Programs
Lecture 2: Parallel computational models
Introduction to Parallelism.
Chapter 3: Principles of Scalable Performance
CS 584.
By Brandon, Ben, and Lee Parallel Computing.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Parallel Speedup.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Presentation transcript:

“elbowing out”

Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   

All terms > 0   (n,p) > 0 Denominator > numerator   (n,p) < 1

Let f =  (n)/(  (n) +  (n)); i.e., f is the fraction of the code which is inherently sequential

15 CS 491 – Parallel and Distributed Computing

n = 100 n = 1,000 n = 10,000 Speedup Processors

Let T p =  (n)+  (n)/p = 1 unit Let s be the fraction of time that a parallel program spends executing the serial portion of the code. s =  (n)/(  (n)+  (n)/p) Then,   = T 1 /T p = T 1 <= s + p*(1-s) (the scaled speedup) Thus, sequential time would be p times the parallelized portion of the code plus the time for the sequential portion.

  <= s + p*(1-s) (the scaled speedup) Restated, Thus, sequential time would be p times the parallel execution time minus (p-1) times the sequential portion of execution time.

Execution on 1 CPU takes 10 times as long… …except 9 do not have to execute serial code

Inherently serial component of parallel computation + processor communication and synchronization overhead Single processor execution time

p  What is the primary reason for speedup of only 4.7 on 8 CPUs? e0.1 Since e is constant, large serial fraction is the primary reason.

p  What is the primary reason for speedup of only 4.7 on 8 CPUs? e Since e is steadily increasing, overhead is the primary reason.

Determine overhead Substitute overhead into speedup equation Substitute T(n,1) =  (n) +  (n). Assume efficiency is constant. Hence, T 0 /T 1 should be a constant fraction. Isoefficiency Relation

where Isoefficiency Relation Problem size Sequential complexity Serial and communication overhead

Number of processors Memory needed per processor Cplogp Cp Clogp C Memory Size per processor Can maintain efficiency Cannot maintain efficiency