Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

Slides:



Advertisements
Similar presentations
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Advertisements

COE 502 / CSE 661 Parallel and Vector Architectures Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of.
Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Why Parallel Architecture? Todd C. Mowry CS 495 January 15, 2002.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Lecture 1: Introduction to High Performance Computing.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
0 Introduction to parallel processing Chapter 1 from Culler & Singh Winter 2007.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Computer performance.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Parallel Computing Laxmikant Kale
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Introduction CSE 410, Spring 2008 Computer Systems
Chapter 1 - The Computer Revolution Chapter 1 — Computer Abstractions and Technology — 1  Progress in computer technology  Underpinned by Moore’s Law.
ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
2015/10/14Part-I1 Introduction to Parallel Processing.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
1 Today About the class Introductions Any new people? Start of first module: Parallel Computing.
ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Parallel Processing & Distributed Systems Thoai Nam And Vu Le Hung.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.
CS433 Spring 2001 Introduction Laxmikant Kale. 2 Course objectives and outline You will learn about: –Parallel programming models Emphasis on 3: message.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
CS 258 Parallel Computer Architecture Lecture 1 Introduction to Parallel Architecture January 23, 2002 Prof John D. Kubiatowicz.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Parallel Processing
Lynn Choi School of Electrical Engineering
Introduction.
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers
Lecture 1: Parallel Architecture Intro
Performance of computer systems
Course Description: Parallel Computer Architecture
Parallel Processing Architectures
CS 258 Parallel Computer Architecture
What is Computer Architecture?
Computer Evolution and Performance
COMS 361 Computer Organization
Chapter 4 Multiprocessors
Performance of computer systems
Presentation transcript:

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues

What is Parallel Algorithm? An algorithm which uses multiple components of hardware and software. Level of parallelism –Circuits -- VLSI Arithmetic –Functional Units -- Instructional level –Processors –Processes –Tasks (Job) –Data Closely related to parallel architectures and computation model

What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve problems fast Issues: –Resource how large a collection? how powerful are the processing elements? –Communication and Synchronization how do the elements cooperate and communicate? how are data transmitted between processors? what are the abstractions and primitives for cooperation? –Performance and Scalability how does it all translate into performance? how does it scale?

Inevitability of Parallel Computing Application demands: Demands for computing cycles –Scientific computing: CFD, Biology, Chemistry, Physics,... –General-purpose computing: Video, Graphics, CAD, Databases, TP... Technology Trends –Number of transistors on chip growing rapidly –Clock rates expected to go up only slowly Architecture Trends –Instruction-level parallelism valuable but limited –Coarser-level parallelism, as in MPs, the most viable approach Economics Current trends: –Today’s microprocessors have multiprocessor support –Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!... –Tomorrow’s microprocessors are multiprocessors

Driving Force of Parallel Architectures Application Trends Applications provide wish lists (Grand Challenge Problems) Weather Simulation, Data Mining, N-body Some problems require terra (10^12) to Peta flops (10^15) VLSI Technology Trends Smaller minimum feature size (transistor size) Increasing Die Size => Transistor count double every 3 year Architectural Trends Trends in Computer Performance: Annual rate of 1.35 before 1985, 1.85 after 1986 (RISC) Programming Model Convergence Economics

Application Trends Transition to parallel computing has occurred for scientific and engineering computing In rapid progress in commercial computing –Database and transactions as well as financial –Usually smaller-scale, but large-scale systems also used Desktop also uses multithreaded programs, which are a lot like parallel programs Demand for improving throughput on sequential workloads –Greatest use of small-scale multiprocessors Solid application demand exists and will increase

General Technology Trends Microprocessor performance increases 50% - 100% per year Transistor count doubles every 3 years DRAM size quadruples every 3 years Huge investment per generation is carried by huge commodity market Not that single-processor performance is plateauing, but that parallelism is a natural way to improve it IntegerFP Sun MIPS M/120 IBM RS MIPS M2000 HP DEC alpha

Technology: A Closer Look Basic advance is decreasing feature size (  ) –Circuits become either faster or lower in power Die size is growing too –Clock rate improves roughly proportional to improvement in –Number of transistors improves like   (or faster) Performance > 100x per decade; clock rate 10x, rest transistor count How to use more transistors? –Parallelism in processing multiple operations per cycle reduces CPI –Locality in data access avoids latency and reduces CPI also improves processor utilization –Both need resources, so tradeoff Fundamental issue is resource distribution, as in uniprocessors Proc$ Interconnect

Economics Commodity microprocessors not only fast but CHEAP Development cost is tens of millions of dollars (5-100 typical) BUT, many more are sold compared to supercomputers –Crucial to take advantage of the investment, and use the commodity building block –Exotic parallel architectures no more than special-purpose Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors Standardization by Intel makes small, bus-based SMPs commodity Desktop: few smaller processors versus one larger one? –Multiprocessor on a chip

Challenge of Parallel Processing : Micro-processors, Memory are cheap. Connect bunches of them together. Use either aynchonously or syncronously. Example: each borad: 100 of 10MFLOP micro-processor each system: 100 Board Speed: 100GFLOP Reason: Cache Coherancy Extra Design Time for Bus, interface Compiler may not find parallelism easily Communication and I/O complexity Economics Amdahl's Law Writing Parallel Programs not easy

Amdahl's Law Speedup S is bounded by 1 S < f + (1-f)/p p: number of PEs, f: fraction of serial part. Example: if 10% of a program must be sequentially, maximum speed-up is 10 To achieve S=20 for p=100, 99.75% of codes should be parallel How to overcome the law? Increase problem size: Scaled speed-up. Develop parallel algorithm maximizing the parallelism and minimizing the sequentail operations. How to Solve Communication Overhead Increase Communication bandwith Decrease Communication Latency Communication Latency Hiding Increase Granularity (Figure 8.4 of Henessey's book)