CSCI-455/522 Introduction to High Performance Computing Lecture 1.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Potential for parallel computers/parallel programming
Chapter 1 Parallel Computers.
Parallel Computers Chapter 1
COMPE 462 Parallel Computing
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your.
Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Slide 1 Course Description and Objectives: The aim of the module is –to introduce techniques for the design of efficient parallel algorithms and –their.
Parallel Programming with MPI and OpenMP
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Introduction. Demand for Computational Speed Continual demand for greater computational speed from a computer system than is currently possible Areas.
CS591x -Cluster Computing and Parallel Programming
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
CSCI-455/552 Introduction to High Performance Computing Lecture 15.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Classification of parallel computers Limitations of parallel processing.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Parallel Computers Chapter 1.
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
Parallel Computing Demand for High Performance
Parallel Computers.
Parallel Computing Demand for High Performance
Parallel Computing Demand for High Performance
Parallel Computing Demand for High Performance
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Computing and Parallel Computers
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Vrije Universiteit Amsterdam
Potential for parallel computers/parallel programming
Presentation transcript:

CSCI-455/522 Introduction to High Performance Computing Lecture 1

Why Need Powerful Computers Traditional scientific and engineering paradigm: –Do theory or paper design. –Perform experiments or build system. Limitations: –Too expensive -- build a throw-away passenger jet. –Too slow -- wait for climate or galactic evolution. –Too dangerous -- weapons, drug design, climate experimentation. Computational science paradigm: –Use high performance computer systems to model the phenomenon in detail, using known physical laws and efficient numerical methods.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.3 Grand Challenge Problems One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Examples Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies.

More Grand Challenge Problems Crash simulation Earthquake and structural modeling Medical studies -- i.e. genome data analysis Web service and web search engines Financial and economic modeling Transaction processing Drug design -- i.e. protein folding Nuclear weapons -- test by simulations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.5 Weather Forecasting Atmosphere modeled by dividing it into 3- dimensional cells. Calculations of each cell repeated many times to model passage of time.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.6 Global Weather Forecasting Example Suppose whole global atmosphere divided into cells of size 1 mile  1 mile  1 mile to a height of 10 miles (10 cells high) - about 5  10 8 cells. Suppose each calculation requires 200 floating point operations. In one time step, floating point operations necessary. To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (10 9 floating point operations/s) takes 10 6 seconds or over 10 days. To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4  floating point operations/sec).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.7 Modeling Motion of Astronomical Bodies Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body. With N bodies, N - 1 forces to calculate for each body, or approx. N 2 calculations. (N log 2 N for an efficient approx. algorithm.) After determining new positions of bodies, calculations repeated.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.8 A galaxy might have, say, stars. Even if each calculation done in 1 ms (extremely optimistic figure), it takes 10 9 years for one iteration using N 2 algorithm and almost a year for one iteration using an efficient N log 2 N approximate algorithm.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.9 Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Parallel Computing Using more than one computer, or a computer with more than one processor, to solve a problem. Motives Usually faster computation - very simple idea - that n computers operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons. Other motives include: fault tolerance, larger amount of memory available,...

How I Used to Make Breakfast ……….

How to Set Family to Work...

How Finally Got to the Office in Time ….

Von Neumann Architecture CPU Memory Typically much slower Than the CPU May not have bandwidth Required to match the CPU Some improvements have included Interleaved Memory Caching Pipelining Superscalar

Moore’s Law

Processor-Memory Performance Gap µProc 60%/yr. DRAM 7%/yr DRAM CPU 1982 Performance “Moore’s Law” From D. Patterson, CS252, Spring 1998 ©UCB

Parallel vs. Serial Computers Two big advantages of parallel computers: 1.total performance 2.total memory Parallel computers enable us to solve problems that: –benefit from, or require, fast solution –require large amounts of memory –example that requires both: weather forecasting

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Background Parallel computers - computers with more than one processor - and their programming - parallel programming - has been around for more than 40 years.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Gill writes in 1958: “... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation...” Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Speedup Factor where t s is execution time on a single processor and t p is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different. S(p) = Execution time using one processor (best sequential algorithm) Execution time using a multiprocessor with p processors tsts tptp

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Speedup factor can also be cast in terms of computational steps: Can also extend time complexity to parallel computations. S(p) = Number of computational steps using one processor Number of parallel computational steps with p processors

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Maximum Speedup Maximum speedup is usually p with p processors (linear speedup). Possible to get superlinear speedup (greater than p) but usually a specific reason such as: Extra memory in multiprocessor system Nondeterministic algorithm

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Maximum Speedup Amdahl’s law Serial section Parallelizable sections (a) One processor (b) Multiple processors ft s (1- f)t s t s - f)t s /p t p p processors

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Speedup factor is given by: This equation is known as Amdahl’s law

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.

1.26 Even with infinite number of processors, maximum speedup limited to 1/f. Example With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.

1.28 Superlinear Speedup example - Searching (a) Searching each sub-space sequentially t s t s /p StartTime  t Solution found xt s /p Sub-space search x indeterminate

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved (b) Searching each sub-space in parallel Solution found  t

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Speed-up then given by S(p)S(p) x t s p  t  + t  =

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e. S(p)S(p) p1– p t s t  +  t   = as  t tends to zero

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e. Actual speed-up depends upon which subspace holds solution but could be extremely large. S(p) = t  t  = 1

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.

Amdahl’s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications. In reality, communications (and I/O) will result in a further degradation of performance. Amdahl’s Law Vs. Reality

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. Gustafson’s Law Thus, Amdahl’s Law predicts that there is a maximum scalability for an application, determined by its parallel fraction, and this limit is generally not large. There is a way around this: increase the problem size –bigger problems mean bigger grids or more particles: bigger arrays –number of serial operations generally remains constant; number of parallel operations increases: parallel fraction increases