08/23/2011CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 23, 2011 1.

Slides:



Advertisements
Similar presentations
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Computer Abstractions and Technology
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
CS61C L28 Parallel Computing (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
CS267 L1 IntroDemmel Sp 1999 CS267 / E233 Applications of Parallel Computers Lecture 1: Introduction 1/18/99 James Demmel
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Parallel Programming Chapter 1 Introduction to Parallel Architectures Johnnie Baker Spring
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
CS6963 Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction L1: Introduction 1.
09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
CSE 260 Parallel Computation Allan Snavely, Henri Casanova
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
08/21/2012CS4230 CS4230 Parallel Programming Lecture 1: Introduction Mary Hall August 21,
Computer System Architectures Computer System Software
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Lecture 2 : Introduction to Multicore Computing
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
Multi-core architectures. Single-core computer Single-core CPU chip.
CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 25, /25/2011 CS4961.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
High-Performance Computing An Applications Perspective REACH-IIT Kanpur 10 th Oct
08/24/2010CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 24,
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
CSCI-455/522 Introduction to High Performance Computing Lecture 1.
CS591x -Cluster Computing and Parallel Programming
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
09/02/2010CS4961 CS4961 Parallel Programming Lecture 4: CTA, cont. Data and Task Parallelism Mary Hall September 2,
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Concurrency and Performance Based on slides by Henri Casanova.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Clouds , Grids and Clusters
CS427 Multicore Architecture and Parallel Computing
The University of Adelaide, School of Computer Science
Morgan Kaufmann Publishers
EE 193: Parallel Computing
Lecture 1: Parallel Architecture Intro
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
COMP60621 Fundamentals of Parallel and Distributed Systems
Multithreaded Programming
COMS 361 Computer Organization
Vrije Universiteit Amsterdam
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

08/23/2011CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 23,

Course Details Time and Location: TuTh, 9:10-10:30 AM, WEB L112 Course Website - Instructor: Mary Hall, -Office Hours: Tu 10:45-11:15 AM; Wed 11:00-11:30 AM TA: Nikhil Mishrikoti, Nikhil Mishrikoti, -Office Hours: TBD SYMPA mailing list - Textbook -“An Introduction to Parallel Programming,” Peter Pacheco, Morgan-Kaufmann Publishers, Also, readings and notes provided for other topics as needed 08/23/2011CS49612

Administrative Prerequisites: -C programming -Knowledge of computer architecture -CS4400 (concurrent may be ok) Please do not bring laptops to class! Do not copy solutions to assignments from the internet (e.g., wikipedia) Read Chapter 1 of textbook by next lecture We will discuss the first written homework assignment, due Aug. 30 before class (two problems and one question) -Homework is posted on the website and will be shown at the end of lecture 08/23/20113 CS4961

Today’s Lecture Overview of course Important problems require powerful computers … -… and powerful computers must be parallel. -Increasing importance of educating parallel programmers (you!) What sorts of architectures in this class -Multimedia extensions, multi-cores, GPUs, networked clusters Developing high-performance parallel applications An optimization perspective 08/23/20114 CS4961

08/23/2011CS4961 Outline Logistics Introduction Technology Drivers for Multi- Core Paradigm Shift Origins of Parallel Programming -Large-scale scientific simulations The fastest computer in the world today Why writing fast parallel programs is hard Algorithm Activity Material for this lecture drawn from: Textbook Kathy Yelick and Jim Demmel, UC Berkeley Quentin Stout, University of Michigan, (see Top 500 list ( 5

Course Objectives Learn how to program parallel processors and systems -Learn how to think in parallel and write correct parallel programs -Achieve performance and scalability through understanding of architecture and software mapping Significant hands-on programming experience -Develop real applications on real hardware Discuss the current parallel computing context -What are the drivers that make this course timely -Contemporary programming models and architectures, and where is the field going 08/23/2011CS49616

Why is this Course Important? Multi-core and many-core era is here to stay -Why? Technology Trends Many programmers will be developing parallel software -But still not everyone is trained in parallel programming -Learn how to put all these vast machine resources to the best use! Useful for -Joining the work force -Graduate school Our focus -Teach core concepts -Use common programming models -Discuss broader spectrum of parallel computing 08/23/2011CS49617

Parallel and Distributed Computing Parallel computing (processing): -the use of two or more processors (computers), usually within a single system, working simultaneously to solve a single problem. Distributed computing (processing): -any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing. Parallel programming: -the human process of developing programs that express what computations should be executed in parallel. 08/23/2011CS49618

Detour: Technology as Driver for “Multi-Core” Paradigm Shift Do you know why most computers sold today are parallel computers? Let’s talk about the technology trends 08/23/2011CS49619

Technology Trends: Microprocessor Capacity Slide source: Maurice Herlihy Clock speed flattening sharply Transistor count still rising Moore’s Law: Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. 08/23/201110CS4961

Key ideas: -Movement away from increasingly complex processor design and faster clocks -Replicated functionality (i.e., parallel) is simpler to design -Resources more efficiently utilized -Huge power management advantages What to do with all these transistors? The Multi-Core or Many-Core Paradigm Shift All Computers are Parallel Computers. 08/23/201111CS4961

Proof of Significance: Popular Press 08/23/2011CS4961 August 2009 issue of Newsweek! Article on 25 things “smart people” should know See 12

08/23/2011CS4961 Scientific Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. Limitations: -Too difficult -- build large wind tunnels. -Too expensive -- build a throw-away passenger jet. -Too slow -- wait for climate or galactic evolution. -Too dangerous -- weapons, drug design, climate experimentation. Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon -Base on known physical laws and efficient numerical methods. 13

The quest for increasingly more powerful machines Scientific simulation will continue to push on system requirements: -To increase the precision of the result -To get to an answer sooner (e.g., climate modeling, disaster modeling) The U.S. will continue to acquire systems of increasing scale -For the above reasons -And to maintain competitiveness 08/23/2011CS496114

A Similar Phenomenon in Commodity Systems More capabilities in software Integration across software Faster response More realistic graphics … CS496108/23/201115

08/23/2011CS4961 Example: Global Climate Modeling Problem Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: -Discretize the domain, e.g., a measurement point every 10 km -Devise an algorithm to predict weather at time t+  t given t Uses: -Predict major events, e.g., El Nino -Use in setting air emissions standards Source: 16

High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL 08/23/2011CS496117

08/23/2011CS4961 Some Characteristics of Scientific Simulation Discretize physical or conceptual space into a grid -Simpler if regular, may be more representative if adaptive Perform local computations on grid -Given yesterday’s temperature and weather pattern, what is today’s expected temperature? Communicate partial results between grids -Contribute local weather result to understand global weather pattern. Repeat for a set of time steps Possibly perform other calculations with results -Given weather model, what area should evacuate for a hurricane? 18

Example of Discretizing a Domain 08/23/2011CS4961 One processor computes this part Another processor computes this part in parallel Processors in adjacent blocks in the grid communicate their result. 19

08/23/2011CS4961 Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner Enough parallelism? (Amdahl’s Law) -Suppose you want to just serve turkey Granularity -How frequently must each assistant report to the chef -After each stroke of a knife? Each step of a recipe? Each dish completed? Locality -Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? Load balance -Each assistant gets a dish? Preparing stuffing vs. cooking green beans? Coordination and Synchronization -Person chopping onions for stuffing can also supply green beans -Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming. 20

08/23/2011CS4961 Finding Enough Parallelism Suppose only part of an application seems parallel Amdahl’s law -let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable -P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s Even if the parallel part speeds up perfectly performance is limited by the sequential part 21

08/23/2011CS4961 Overhead of Parallelism Given enough parallel work, this is the biggest barrier to getting desired speedup Parallelism overheads include: -cost of starting a thread or process -cost of communicating shared data -cost of synchronizing -extra (redundant) computation Each of these can be in the range of milliseconds (=millions of flops) on some systems Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work 22

08/23/2011CS4961 Locality and Parallelism Large memories are slow, fast memories are small Program should do most work on local data Proc Cache L2 Cache L3 Cache Memory Conventional Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects 23

08/23/2011CS4961 Load Imbalance Load imbalance is the time that some processors in the system are idle due to -insufficient parallelism (during that phase) -unequal size tasks Examples of the latter -adapting to “interesting parts of a domain” -tree-structured computations -fundamentally unstructured problems Algorithm needs to balance load 24

Homework 1: Parallel Programming Basics Turn in electronically on the CADE machines using the handin program: “handin cs4961 hw1 ” Problem 1: #1.3 in textbook Problem 2: I recently had to tabulate results from a written survey that had four categories of respondents: (I) students; (II) academic professionals; (III) industry professionals; and, (IV) other. The respondents selected to which category they belonged and then answered 32 questions with five possible responses: (i) strongly agree; (ii) agree; (iii) neutral; (iv) disagree; and, (v) strongly disagree. My family members and I tabulated the results “in parallel” (assume there were four of us). -(a) Identify how data parallelism can be used to tabulate the results of the survey. Keep in mind that each individual survey is on a separate sheet of paper that only one “processor” can examine at a time. Identify scenarios that might lead to load imbalance with a purely data parallel scheme. -(b) Identify how task parallelism and combined task and data parallelism can be used to tabulate the results of the survey to improve upon the load imbalance you have identified. 08/23/2011CS496125

Homework 1, cont. Problem 3: What are your goals after this year and how do you anticipate this class is going to help you with that? Some possible answers, but please feel free to add to them. Also, please write at least one sentence of explanation. -A job in the computing industry -A job in some other industry that uses computing -As preparation for graduate studies -To satisfy intellectual curiosity about the future of the computing field -Other 08/23/2011CS496126

08/23/2011CS4961 Summary of Lecture Solving the “Parallel Programming Problem” -Key technical challenge facing today’s computing industry, government agencies and scientists Scientific simulation discretizes some space into a grid -Perform local computations on grid -Communicate partial results between grids -Repeat for a set of time steps -Possibly perform other calculations with results Commodity parallel programming can draw from this history and move forward in a new direction Writing fast parallel programs is difficult -Amdahl’s Law  Must parallelize most of computation -Data Locality -Communication and Synchronization -Load Imbalance 27

Next Time An exploration of parallel algorithms and their features First written homework assignment 08/23/2011CS496128