9/20/2001CSE 260, class 1 CSE 260 – Introduction to Parallel Computation Larry Carter Office Hours: AP&M 4101 MW 10:00-11 or by appointment.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

1. Introduction to Parallel Computing
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Parallel Architectures
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
High-Performance Computing 12.1: Concurrent Processing.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
What is a Distributed System? n From various textbooks: l “A distributed system is a collection of independent computers that appear to the users of the.
Parallel and Distributed Computing References Introduction to Parallel Computing, Second Edition Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
2015/10/14Part-I1 Introduction to Parallel Processing.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Parallel Computing.
CS591x -Cluster Computing and Parallel Programming
Pipelining and Parallelism Mark Staveley
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Outline Why this subject? What is High Performance Computing?
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lecture 3: Computer Architectures
Chapter 2: Data Manipulation
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Copyright © Curt Hill SIMD Single Instruction Multiple Data.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Processing
COMP 740: Computer Architecture and Implementation
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Introduction to Parallel Computing
CMSC 611: Advanced Computer Architecture
buses, crossing switch, multistage network.
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Modern Processor Design: Superscalar and Superpipelining
Different Architectures
Chapter 17 Parallel Processing
Parallel Processing Architectures
buses, crossing switch, multistage network.
AN INTRODUCTION ON PARALLEL PROCESSING
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
Presentation transcript:

9/20/2001CSE 260, class 1 CSE 260 – Introduction to Parallel Computation Larry Carter Office Hours: AP&M 4101 MW 10:00-11 or by appointment

CSE 260, class 12 Topics InstancesPrinciplesTheory Hardware specific machines parallelism, pipelining,... limits to performance Languages... Applications Algorithms Systems

CSE 260, class 13 Emphasis of course Scientific computation –I’ll mostly ignore commercial computing (even though it’s important) Supercomputers and supercomputing Applications Focus on topics of practical significance

CSE 260, class 14 This is a graduate course... If you want to shift emphasis, let’s talk! –E.g. you might want to do a different project Surprise me! –Do something extra occasionally (write mini-studies) Question authority! –Questions and discussions are encouraged –I’m opinionated and sometimes wrong Listen to your classmates! –They have insights from different perspectives

CSE 260, class 15 Syllabus Weeks 1-4: Whirlwind overview (20%) –Learn vocabulary used in field –Build mental “filing cabinet” for organizing later topics –Three quizzlets Weeks 5-10: Selected topics in depth (25%) –Important and/or interesting papers –Give presentation, or (perhaps) write critique Project: One application in various languages (35%) –All on same computer (SUN Ultra at SDSC) Mini-projects (20%) –5-minute report supplementing class material

CSE 260, class 16 Vocabulary (1 st three weeks) Know terms that are underlined* –Should be in your passive vocabulary (i.e. if someone uses the term, you should have a reasonably good idea what it means) – Not necessarily in your active vocabulary (i.e., I don’t expect you to be able to list all the terms). Quizzlets* will be multiple choice or fill-in-the- blanks, not essays. * unfortunately, PowerPoint may underline misspellings.

CSE 260, class 17 Any Administrative Questions?

CSE 260, class 18 Class 1: Parallel Architectures Interesting reading: –Chapter 9 of Patterson & Hennessy’s undergrad text (second edition). [Or Chapter 10 of Hennessy & Patterson’s graduate text]. Parallel computer: –Almasi + Gottlieb: “a large collection of processing elements that can communicate and cooperate to solve large problems fast.” –Many “processing elements” cooperating on a single problem. –Multiprocessor server? Not large enough. Networks of workstations and internet-connected computers when working together? Sure. Supercomputer: A computer costing $3,000,000 or more.

CSE 260, class 19 Why bother? Gene Amdahl (!): “For over a decade, prophets have voiced the contention that the organization of a single computer has reached its limit.” –he went on to argue that the single processor approach to large-scale computing was still viable. Parallel computing is expensive –Higher cost per cycle –Greater programming effort –Less convenient access

CSE 260, class 110 Possible answers... Answers today are more valuable than answers tomorrow –weather prediction –conference submissions –product design (airplanes, drugs,...) Some problem requires huge memories –Once you have a huge memory, it’s more economical to have multiple processors.

CSE 260, class 111 Von Neumann Bottleneck “The instruction stream is inherently sequential – there is one processing site and all instructions, operands and results must flow through a bottleneck between processors and memory.” The goal of parallel computers is to overcome the Von Neumann Bottleneck. PM (term introduced by John Backus in 1978, referring to design described by John Von Neumann in 1945) notice the underline

CSE 260, class 112 Flynn’s Taxonomy Flynn (1966) classified machines by data and control streams Single Instruction Single Data (SISD) Single Instruction Multiple Data SIMD Multiple Instruction Single Data (MISD) Multiple Instruction Multiple Data MIMD notice the underline

CSE 260, class 113 SISD –Model of serial von Neumann machine –Logically, single control processor –Includes some supercomputers, such as the 1963 CDC 6600 (perhaps the first supercomputer) PM

CSE 260, class 114 SIMD –Multiple processors executing the same program in lockstep –Data that each processor sees may be different –Single control processor –Individual processors can be turned on/off at each cycle (“masking”) –Examples: Illiac IV, Thinking Machines’ CM-2, Maspar, DAP, Goodyear MPP,

CSE 260, class 115 The ill-fated Illiac IV Project started in 1965, predicted to cost $8M and provide 1000 MFLOP/S. Delivered to NASA Ames in 1972, cost $31M, ran first application in 1976, performed 15 MFLOP/S. 64 processors, 13-MHz clock, 1MB memory –Meanwhile, the CDC 7600 (basically a superscalar uniprocessor), was 36 MHz, 36 MFLOP/S, 3.75 MB memory, $5.1M, and running in 1969.

CSE 260, class 116 CM2 Architecture CM2 (1990, built by Thinking Machines Corp) had 8,192 to 65,536 one-bit processors, plus one floating-point unit per 64(?) procs. Data Vault provides peripheral mass storage Single program - all unmasked operations happened in parallel. 16K proc Seq 0 16K proc Seq 2 16K proc Seq 3 16K proc Seq 1 Nexus Front end 2 Front end 0 Front end 1 Front end 3 CM I/O System Data Vault Graphic Display

CSE 260, class 117 Vector Computers Hybrid SISD/SIMD – has ordinary “scalar” operations and “vector operations”, which operate on up to (say) 256 independent sets of operands (utilizing “vector registers”) in fast pipeline mode. Examples: Cray supercomputers (X-MP, Y-MP, C90, T90, SV1,...), Fujitsu (VPPxxx), NEC, Hitachi. –many of these have multiple vector processors, but typically the separate processors are used for separate jobs. 4 or 8-way SIMD also used in graphics & multimedia accelerators, video game machines.

CSE 260, class 118 MIMD All processors execute their own set of instructions Processors operate on separate data streams May have separate clocks IBM SP’s, TMC’s CM-5, Cray T3D & T3E, SGI Origin, Tera MTA, Clusters, etc.

CSE 260, class 119 SP2 High Performance Switch of 64 node SP2 Multiple paths between any two nodes Network scales with added nodes NODESNODES Switchboard NODESNODES NODESNODES NODESNODES

CSE 260, class 120 Some more MIMD computers Cluster: computers connected over high- bandwidth local area network (usually Ethernet or Myrinet), used as a parallel computer. NOW (Network Of Workstations): homogeneous cluster (all computers on network are same model). “The Grid”: computers connected over wide area network

CSE 260, class 121 Larry’s conjecture SIMD is used on early machines in a given generation; it gives way to MIMD. –When space is scarce, you can save by having only one control unit –As components shrink and memory becomes cheaper, the flexibility of MIMD prevails (Conceivable mini-project: find evidence for or against Larry’s conjecture)

CSE 260, class 122 What about MISD? Multiple Instruction Single Data The term isn’t used (except when discussing the Flynn taxonomy). Perhaps applies to pipelined computation, e.g. sonar data passing through sequence of special-purpose signal processors. No underline