Complexity Measures for Parallel Computation

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
SE-292 High Performance Computing
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Performance Evaluation of Architectures Vittorio Zaccaria.
Potential for parallel computers/parallel programming
1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.
1 Parallel Computing 6 Performance Analysis Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel System Performance CS 524 – High-Performance Computing.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
Parallel System Performance CS 524 – High-Performance Computing.
CS 240A: Complexity Measures for Parallel Computation.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
RAM, PRAM, and LogP models
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Classification of parallel computers Limitations of parallel processing.
Distributed and Parallel Processing George Wells.
Potential for parallel computers/parallel programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
Model Problem: Solving Poisson’s equation for temperature
What Exactly is Parallel Processing?
Pipelined Computations
Chapter 3: Principles of Scalable Performance
Complexity Measures for Parallel Computation
CS 140 : Matrix multiplication
Introduction to CILK Some slides are from:
Quiz Questions Parallel Programming Parallel Computing Potential
Parallelismo.
Multithreaded Programming in Cilk Lecture 1
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallelism and Amdahl's Law
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Matrix Addition and Multiplication
Parallel Speedup.
Potential for parallel computers/parallel programming
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Potential for parallel computers/parallel programming
Introduction to CILK Some slides are from:
CS 140 : Matrix multiplication
Parallel Matrix Multiply
Presentation transcript:

Complexity Measures for Parallel Computation

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work T∞ = span* * Also called critical-path length or computational depth.

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work T∞ = span* WORK LAW TP ≥T1/P SPAN LAW TP ≥ T∞ * Also called critical-path length or computational depth.

A B Series Composition Work: T1(A∪B) = Work: T1(A∪B) = T1(A) + T1(B) Span: T∞(A∪B) = Span: T∞(A∪B) = T∞(A) +T∞(B)

A B Parallel Composition Work: T1(A∪B) = T1(A) + T1(B) Work: T1(A∪B) = Span: T∞(A∪B) = max{T∞(A), T∞(B)} Span: T∞(A∪B) =

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Speedup Def. T1/TP = speedup on P processors. If T1/TP = (P), we have linear speedup, = P, we have perfect linear speedup, > P, we have superlinear speedup, (which is not possible in this model, because of the Work Law TP ≥ T1/P)

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Parallelism Because the Span Law dictates that TP ≥ T∞, the maximum possible speedup given T1 and T∞ is T1/T∞ = (potential) parallelism = the average amount of work per step along the span.

Complexity measures for parallel computation Problem parameters: n index of problem size p number of processors Algorithm parameters: tp running time on p processors t1 time on 1 processor = sequential time = “work” t∞ time on unlimited procs = critical path length = “span” v total communication volume Performance measures speedup s = t1 / tp efficiency e = t1 / (p*tp) = s / p (potential) parallelism pp = t1 / t∞

Laws of parallel complexity Work law: tp ≥ t1 / p Span law: tp ≥ t∞ Amdahl’s law: If a fraction f, between 0 and 1, of the work must be done sequentially, then speedup ≤ 1 / f Exercise: prove Amdahl’s law from the span law.

Detailed complexity measures for data movement I Moving data between processors by message-passing Machine parameters: a or tstartup latency (message startup time in seconds) b or tdata inverse bandwidth (in seconds per word) between nodes of Triton, a ~ 2.2 × 10-6 and b ~ 6.4 × 10-9 Time to send & recv or bcast a message of w words: a + w*b tcomm total commmunication time tcomp total computation time Total parallel time: tp = tcomp + tcomm

Detailed complexity measures for data movement II Moving data between cache and memory on one processor: Assume just two levels in memory hierarchy, fast and slow All data initially in slow memory m = number of memory elements (words) moved between fast and slow memory tm = time per slow memory operation f = number of arithmetic operations tf = time per arithmetic operation, tf << tm q = f / m average number of flops per slow element access Minimum possible time = f * tf when all data in fast memory Actual time f * tf + m * tm = f * tf * (1 + tm/tf * 1/q) Larger q means time closer to minimum f * tf