CS 584.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Distributed Systems CS
Parallel Algorithms Lecture Notes. Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed.
Potential for parallel computers/parallel programming
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
GEOS-CHEM: Platforms and Resolution 4x5 is under control, many platform options exist. 2x2.5 is a factor of 5-6 slower than 4x5, but useful. global 1x1.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Performance of Parallel Programs Michelle Kuttel 1.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
Timing Trials An investigation arising out of the Assignment CS32310 – Nov 2013 H Holstein 1.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Parallel Programming with MPI and OpenMP
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Classification of parallel computers Limitations of parallel processing.
Distributed and Parallel Processing George Wells.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
Supercomputing in Plain English Tuning Blue Waters Undergraduate Petascale Education Program May 29 – June
Potential for parallel computers/parallel programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Defining Performance Which airplane has the best performance?
What Exactly is Parallel Processing?
Introduction to Parallelism.
Computer Architecture Experimental Design
Parallel Computers.
Parallel Processing Sharing the load.
Scientific Inquiry Unit 0.3.
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
Analyzing an Algorithm Computing the Order of Magnitude Big O Notation
Amdahl's law.
COMP60621 Fundamentals of Parallel and Distributed Systems
By Brandon, Ben, and Lee Parallel Computing.
Computer Evolution and Performance
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Quiz Questions Parallel Programming Parallel Computing Potential
Potential for parallel computers/parallel programming
COMP60611 Fundamentals of Parallel and Distributed Systems
Design and Analysis of Algorithms
Presentation transcript:

CS 584

Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of logic is the syllogism, consisting of a major and minor premise and a conclusion.

Example Major Premise: Sixty men can do a piece of work sixty times as quickly as one man. Minor Premise: One man can dig a post-hole in sixty seconds. Conclusion: Sixty men can dig a post-hole in one second.

Performance Analysis: "Tar Baby" Ask the right questions Questions to consider What is time? What is work? Objectivity is the key Take a step back from your program

Performance Analysis Statements There is always a trade-off between time and solution quality. We should compare the quality of the answer for a given execution time. For any performance reporting, find and clearly state the quality measure.

Efficiency Efficiency is defined as speedup/P With superlinear speedup efficiency > 1 Does cache make a processor work at 110%? Why is communication not considered work but rather overhead?

Speedup Conventional speedup is defined as the reduction in execution time. Consider running a problem on a slow parallel computer and on a faster one. Same serial component Speedup will be lower on the faster computer.

Speedup and Amdahl's Law Conventional speedup penalizes faster absolute speed. Assumption that task size is constant as the computing power increases results in an exaggeration of task overhead. Scaling the problem size reduces these distortion effects.

Solution Gustafson introduces scaled speedup. Scale the problem size as you increase the number of processors. Calculated in two ways Experimentally Analytical models

Traditional Speedup C ( N ) = Speedup C ( N ) 1 C ( N ) P C1 is complexity (time) taken on a single processor CP is complexity (time) taken on P processors

Scaled Speedup ) ( PN C Speedup = 1 PN C Speedup P = C1 is complexity (time) taken on a single processor CP is complexity (time) taken on P processors

Experimental Scaled Speedup Keep the ratio N/P constant between single processor case and many processor case when testing Example:Calculate the speedup for 8, and 16 processors. N/P = 256 How big should the problem be?

Using analytical models Examine the control flow of the algorithm Find a general algebraic form for the complexity (execution time). Fit the curve with experimental data. If the fit is poor, find the missing terms and repeat. Calculate the scaled speedup using formula.

Example Serial Time = 2 + 12 N seconds Parallel Time = 4 + 12 N/P + 5P seconds Let N/P = 128 Scaled Speedup for 4 processors is: ) ( 1 = PN C P ) 4 ( 5 / 128 12 )) 2 = + 93 . 3 1560 6146 =

Traditional Speedup ideal Speedup measured Number of Processors

Scaled Speedup Large Problem ideal Speedup Medium problem Small problem Number of Processors

Assignment Problems on the web Create a model for your program Use the model to calculate traditional speedup scaled speedup Experimentally calculate the values Compare the results.