09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

Slides:



Advertisements
Similar presentations
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Advertisements

1 Computational models of the physical world Cortical bone Trabecular bone.
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
Beowulf Supercomputer System Lee, Jung won CS843.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
CS61C L28 Parallel Computing (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
CS267 L1 IntroDemmel Sp 1999 CS267 / E233 Applications of Parallel Computers Lecture 1: Introduction 1/18/99 James Demmel
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Parallel Programming Chapter 1 Introduction to Parallel Architectures Johnnie Baker Spring
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
CS6963 Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction L1: Introduction 1.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
1 Lecture 1 Introduction to Parallel Computing Parallel Computing Fall 2008.
Lecture 1: Introduction to High Performance Computing.
CS61C L42 Parallel Computing (1) Carle, Spring 2005 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #42.
Introduction to Parallel Computing. Outline Introduction Large important problems require powerful computers Why powerful computers must be parallel processors.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
08/23/2011CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 23,
CSE 260 Parallel Computation Allan Snavely, Henri Casanova
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
08/21/2012CS4230 CS4230 Parallel Programming Lecture 1: Introduction Mary Hall August 21,
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Projected Performance Development. TOP 500 Performance Projection.
ICT in Weather Forecasting
Lecture 2 : Introduction to Multicore Computing
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
08/24/2010CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 24,
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Parallel Simulation of Continuous Systems: A Brief Introduction
CS267 L1 Intro CS267 Applications of Parallel Computers Lecture 1: Introduction David H. Bailey Based on previous notes by Prof. Jim Demmel and Prof. David.
09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
CSCI-455/522 Introduction to High Performance Computing Lecture 1.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Department of Computer Science University of the West Indies Part II.
CS591x -Cluster Computing and Parallel Programming
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
1 CS145 Lecture 26 What’s next?. 2 What software questions do we study? Where is software headed?
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Real-Time Systems, Events, Triggers. Real-Time Systems A system that has operational deadlines from event to system response A system whose correctness.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
09/02/2010CS4961 CS4961 Parallel Programming Lecture 4: CTA, cont. Data and Task Parallelism Mary Hall September 2,
Concurrency and Performance Based on slides by Henri Casanova.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
CS427 Multicore Architecture and Parallel Computing
Programming Models for SimMillennium
Mattan Erez The University of Texas at Austin
Computational issues Issues Solutions Large time scale
Presentation transcript:

09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008

09/22/2008CS49602 Outline Introduction The fastest computer in the world today Large-scale scientific simulations Why writing fast parallel programs is hard New parallel programming languages Material for this lecture provided by: Kathy Yelick and Jim Demmel, UC Berkeley Brad Chamberlain, Cray

09/22/2008CS49603 Parallel and Distributed Computing Limited to supercomputers? -No! Everywhere! Scientific applications? -These are still important, but also many new commercial applications and new consumer applications are going to emerge. Programming tools adequate and established? -No! Many new research challenges My Research Area

09/22/2008CS49604 Why is This Course Important? Why now? We are seeing a convergence of high-end, conventional and embedded computing -On-chip architectures look like parallel computers -Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains Why? -Technology trends Looking to the future -Parallel computing for the masses demands better parallel programming paradigms -And more people who are trained in writing parallel programs (you!) -How to put all these vast machine resources to the best use!

09/22/2008CS49605 The fastest computer in the world today What is its name? Where is it located? How many processors does it have? What kind of processors? How fast is it? RoadRunner Los Alamos National Laboratory 18,802 processor chips (~123,284 “processors”) AMD Opterons and IBM Cell/BE (in Playstations) Petaflop/second One quadrilion operations/s 1 x See

09/22/2008CS49606 Scientific Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. Limitations: -Too difficult -- build large wind tunnels. -Too expensive -- build a throw-away passenger jet. -Too slow -- wait for climate or galactic evolution. -Too dangerous -- weapons, drug design, climate experimentation. Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon -Base on known physical laws and efficient numerical methods.

09/22/2008CS49607 Some Particularly Challenging Computations Science -Global climate modeling -Biology: genomics; protein folding; drug design -Astrophysical modeling -Computational Chemistry -Computational Material Sciences and Nanosciences Engineering -Semiconductor design -Earthquake and structural modeling -Computation fluid dynamics (airplane design) -Combustion (engine design) -Crash simulation Business -Financial and economic modeling -Transaction processing, web services and search engines Defense -Nuclear weapons -- test by simulations -Cryptography

09/22/2008CS49608 Example: Global Climate Modeling Problem Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: -Discretize the domain, e.g., a measurement point every 10 km -Devise an algorithm to predict weather at time t+  t given t Uses: -Predict major events, e.g., El Nino -Use in setting air emissions standards Source:

High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL

09/22/2008CS Some Characteristics of Scientific Simulation Discretize physical or conceptual space into a grid -Simpler if regular, may be more representative if adaptive Perform local computations on grid -Given yesterday’s temperature and weather pattern, what is today’s expected temperature? Communicate partial results between grids -Contribute local weather result to understand global weather pattern. Repeat for a set of time steps Possibly perform other calculations with results -Given weather model, what area should evacuate for a hurricane?

09/22/2008CS More Examples: Parallel Computing in Data Analysis Finding information amidst large quantities of data General themes of sifting through large, unstructured data sets: -Has there been an outbreak of some medical condition in a community? -Which doctors are most likely involved in fraudulent charging to medicare? -When should white socks go on sale? -What advertisements should be sent to you? Data collected and stored at enormous speeds (Gbyte/hour) -remote sensor on a satellite -telescope scanning the skies -microarrays generating gene expression data -scientific simulations generating terabytes of data -NSA analysis of telecommunications

09/22/2008CS Why writing (fast) parallel programs is hard

09/22/2008CS Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner Enough parallelism? (Amdahl’s Law) -Suppose you want to just serve turkey Granularity -How frequently must each assistant report to the chef -After each stroke of a knife? Each step of a recipe? Each dish completed? Locality -Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? Load balance -Each assistant gets a dish? Preparing stuffing vs. cooking green beans? Coordination and Synchronization -Person chopping onions for stuffing can also supply green beans -Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming.

09/22/2008CS Finding Enough Parallelism Suppose only part of an application seems parallel Amdahl’s law -let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable -P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s Even if the parallel part speeds up perfectly performance is limited by the sequential part

09/22/2008CS Overhead of Parallelism Given enough parallel work, this is the biggest barrier to getting desired speedup Parallelism overheads include: -cost of starting a thread or process -cost of communicating shared data -cost of synchronizing -extra (redundant) computation Each of these can be in the range of milliseconds (=millions of flops) on some systems Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work

09/22/2008CS4960 Locality and Parallelism Large memories are slow, fast memories are small Storage hierarchies are large and fast on average Parallel processors, collectively, have large, fast cache -the slow accesses to “remote” data we call “communication” Algorithm should do most work on local data Proc Cache L2 Cache L3 Cache Memory Conventional Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects

09/22/2008CS Load Imbalance Load imbalance is the time that some processors in the system are idle due to -insufficient parallelism (during that phase) -unequal size tasks Examples of the latter -adapting to “interesting parts of a domain” -tree-structured computations -fundamentally unstructured problems Algorithm needs to balance load

09/22/2008CS New Parallel Programming Languages for Scientific Computing

09/22/2008CS A Brief Look at Chapel (Cray) History: -Starting in 2002, the Defense Advanced Research Projects Agency (DARPA) funded 5 industry players to investigate the new revolutionary, commercially-viable high-end computer system of Now, two teams (IBM and Cray) are still building systems to be deployed next year -Both have introduced new languages, along with Sun -Chapel (Cray) -X10 (IBM) -Fortress (Sun) We will look at Chapel for the next few slides

09/22/2008CS Summary of Lecture Scientific simulation discretizes some space into a grid -Perform local computations on grid -Communicate partial results between grids -Repeat for a set of time steps -Possibly perform other calculations with results Writing fast parallel programs is difficult -Amdahl’s Law  Must parallelize most of computation -Data Locality -Communication and Synchronization -Load Imbalance Challenge for new productive parallel programming languages -Express data partitioning and parallelism at a high leve -Still obtain high performance!