Complexity Measures for Parallel Computation

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

SE-292 High Performance Computing

1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Performance Evaluation of Architectures Vittorio Zaccaria.

Potential for parallel computers/parallel programming

1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.

1 Parallel Computing 6 Performance Analysis Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.

1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.

Parallel System Performance CS 524 – High-Performance Computing.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.

Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.

1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

1 Lecture 11 Sorting Parallel Computing Fall 2008.

Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.

Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.

Parallel System Performance CS 524 – High-Performance Computing.

CS 240A: Complexity Measures for Parallel Computation.

Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.

Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

RAM, PRAM, and LogP models

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.

Advanced Computer Networks Lecture 1 - Parallelization 1.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

Classification of parallel computers Limitations of parallel processing.

Distributed and Parallel Processing George Wells.

Potential for parallel computers/parallel programming

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

Model Problem: Solving Poisson’s equation for temperature

What Exactly is Parallel Processing?

Pipelined Computations

Chapter 3: Principles of Scalable Performance

Complexity Measures for Parallel Computation

CS 140 : Matrix multiplication

Introduction to CILK Some slides are from:

Quiz Questions Parallel Programming Parallel Computing Potential

Multithreaded Programming in Cilk Lecture 1

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Parallelism and Amdahl's Law

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Matrix Addition and Multiplication

Parallel Speedup.

Potential for parallel computers/parallel programming

Quiz Questions Parallel Programming Parallel Computing Potential

Quiz Questions Parallel Programming Parallel Computing Potential

Potential for parallel computers/parallel programming

Introduction to CILK Some slides are from:

CS 140 : Matrix multiplication

Parallel Matrix Multiply

Presentation transcript:

Complexity Measures for Parallel Computation

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work T∞ = span* * Also called critical-path length or computational depth.

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Work and span TP = execution time on P processors T1 = work T∞ = span* WORK LAW TP ≥T1/P SPAN LAW TP ≥ T∞ * Also called critical-path length or computational depth.

A B Series Composition Work: T1(A∪B) = Work: T1(A∪B) = T1(A) + T1(B) Span: T∞(A∪B) = Span: T∞(A∪B) = T∞(A) +T∞(B)

A B Parallel Composition Work: T1(A∪B) = T1(A) + T1(B) Work: T1(A∪B) = Span: T∞(A∪B) = max{T∞(A), T∞(B)} Span: T∞(A∪B) =

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Speedup Def. T1/TP = speedup on P processors. If T1/TP = (P), we have linear speedup, = P, we have perfect linear speedup, > P, we have superlinear speedup, (which is not possible in this model, because of the Work Law TP ≥ T1/P)

Multithreaded Programming in Cilk Lecture 1 July 13, 2006 Parallelism Because the Span Law dictates that TP ≥ T∞, the maximum possible speedup given T1 and T∞ is T1/T∞ = (potential) parallelism = the average amount of work per step along the span.

Complexity measures for parallel computation Problem parameters: n index of problem size p number of processors Algorithm parameters: tp running time on p processors t1 time on 1 processor = sequential time = “work” t∞ time on unlimited procs = critical path length = “span” v total communication volume Performance measures speedup s = t1 / tp efficiency e = t1 / (p*tp) = s / p (potential) parallelism pp = t1 / t∞

Laws of parallel complexity Work law: tp ≥ t1 / p Span law: tp ≥ t∞ Amdahl’s law: If a fraction f, between 0 and 1, of the work must be done sequentially, then speedup ≤ 1 / f Exercise: prove Amdahl’s law from the span law.

Detailed complexity measures for data movement I Moving data between processors by message-passing Machine parameters: a or tstartup latency (message startup time in seconds) b or tdata inverse bandwidth (in seconds per word) between nodes of Triton, a ~ 2.2 × 10-6 and b ~ 6.4 × 10-9 Time to send & recv or bcast a message of w words: a + w*b tcomm total commmunication time tcomp total computation time Total parallel time: tp = tcomp + tcomm

Detailed complexity measures for data movement II Moving data between cache and memory on one processor: Assume just two levels in memory hierarchy, fast and slow All data initially in slow memory m = number of memory elements (words) moved between fast and slow memory tm = time per slow memory operation f = number of arithmetic operations tf = time per arithmetic operation, tf << tm q = f / m average number of flops per slow element access Minimum possible time = f * tf when all data in fast memory Actual time f * tf + m * tm = f * tf * (1 + tm/tf * 1/q) Larger q means time closer to minimum f * tf