ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.

Slides:

Advertisements

Similar presentations

ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.

Advertisements

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Parallel Programming Languages Andrew Rau-Chaplin.

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of.

ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.

ECE669 L2: Architectural Perspective February 3, 2004 ECE 669 Parallel Computer Architecture Lecture 2 Architectural Perspective.

ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.

EECC756 - Shaaban #1 lec # 2 Spring Parallel Architectures History Application Software System Software SIMD Message Passing Shared Memory.

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

3.5 Interprocess Communication

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Fundamental Design Issues CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.

Computer Architecture II 1 Computer architecture II Introduction.

ECE669 L7: Resource Balancing February 19, 2004 ECE 669 Parallel Computer Architecture Lecture 7 Resource Balancing.

Parallel Processing Architectures Laxmi Narayan Bhuyan

Parallel Programming Todd C. Mowry CS 740 October 16 & 18, 2000 Topics Motivating Examples Parallel Programming for High Performance Impact of the Programming.

EECC756 - Shaaban #1 lec # 2 Spring Parallel Architectures History Application Software System Software SIMD Message Passing Shared Memory.

EECC756 - Shaaban #1 lec # 1 Spring Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate.

Implications for Programming Models Todd C. Mowry CS 495 September 12, 2002.

EECC756 - Shaaban #1 lec # 2 Spring Conventional Computer Architecture Abstraction Conventional computer architecture has two aspects: 1.

ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.

ECE669 L6: Programming for Performance February 17, 2004 ECE 669 Parallel Computer Architecture Lecture 6 Programming for Performance.

Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.

Computer architecture II

Parallel Architectures

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.

1 Programming for Performance Reference: Chapter 3, Parallel Computer Architecture, Culler et.al.

(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.

EECC756 - Shaaban #1 lec # 2 Spring Conventional Computer Architecture Abstraction Conventional computer architecture has two aspects: 1.

ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Intro to Parallel Architecture Todd C. Mowry CS 740 September 19, 2007.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

INEL6067 Technology ---> Limitations & Opportunities Wires -Area -Propagation speed Clock Power VLSI -I/O pin limitations -Chip area -Chip crossing delay.

CMPE655 - Shaaban #1 lec # 2 Fall Conventional Computer Architecture Abstraction Conventional computer architecture has two aspects: 1 The.

Outline Why this subject? What is High Performance Computing?

1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

CMPE655 - Shaaban #1 lec # 2 Spring Conventional Computer Architecture Abstraction Conventional computer architecture has two aspects: 1.

Evolution and Convergence of Parallel Architectures Todd C. Mowry CS 495 January 17, 2002.

#1 lec # 2 Fall Conventional Computer Architecture Abstraction Conventional computer architecture has two aspects: 1 The definition of critical.

Parallel Computing Presented by Justin Reschke

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

COMP7330/7336 Advanced Parallel and Distributed Computing Data Parallelism Dr. Xiao Qin Auburn University

Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Fundamental Design Issues Dr. Xiao Qin Auburn.

These slides are based on the book:

18-447: Computer Architecture Lecture 30B: Multiprocessors

Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Parallel Programming Models Dr. Xiao Qin Auburn.

Shared Memory Multiprocessors

Lecture 1: Parallel Architecture Intro

Multiprocessor Architectures and Parallel Programs

Convergence of Parallel Architectures

Parallel Processing Architectures

Parallelization of An Example Program

Parallel Computer Architecture

Presentation transcript:

ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review

ECE669 L15: Mid-term Review March 25, 2004 Is Parallel Computing Inevitable? °Application demands: Our insatiable need for computing cycles °Technology Trends °Architecture Trends °Economics °Current trends: Today’s microprocessors have multiprocessor support Servers and workstations becoming MP: Sun, SGI, DEC, HP!... Tomorrow’s microprocessors are multiprocessors

ECE669 L15: Mid-term Review March 25, 2004 Application Trends °Application demand for performance fuels advances in hardware, which enables new appl’ns, which... Cycle drives exponential increase in microprocessor performance Drives parallel architecture harder - most demanding applications °Range of performance demands Need range of system performance with progressively increasing cost New Applications More Performance

ECE669 L15: Mid-term Review March 25, 2004 Architectural Trends °Architecture translates technology’s gifts into performance and capability °Resolves the tradeoff between parallelism and locality Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect Tradeoffs may change with scale and technology advances °Understanding microprocessor architectural trends => Helps build intuition about design issues or parallel machines => Shows fundamental role of parallelism even in “sequential” computers

ECE669 L15: Mid-term Review March 25, 2004 Phases in “VLSI” Generation

ECE669 L15: Mid-term Review March 25, 2004 Programming Model °Look at major programming models Where did they come from? What do they provide? How have they converged? °Extract general structure and fundamental issues °Reexamine traditional camps from new perspective SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Architecture

ECE669 L15: Mid-term Review March 25, 2004 Programming Model °Conceptualization of the machine that programmer uses in coding applications How parts cooperate and coordinate their activities Specifies communication and synchronization operations °Multiprogramming no communication or synch. at program level °Shared address space like bulletin board °Message passing like letters or phone calls, explicit point to point °Data parallel: more regimented, global actions on data Implemented with shared address space or message passing

ECE669 L15: Mid-term Review March 25, 2004 Shared Physical Memory °Any processor can directly reference any memory location °Any I/O controller - any memory °Operating system can run on any processor, or all. OS uses shared memory to coordinate °Communication occurs implicitly as result of loads and stores ° What about application processes?

ECE669 L15: Mid-term Review March 25, 2004 Message Passing Architectures °Complete computer as building block, including I/O Communication via explicit I/O operations °Programming model direct access only to private address space (local memory), communication via explicit messages (send/receive) °High-level block diagram Communication integration? -Mem, I/O, LAN, Cluster Easier to build and scale than SAS °Programming model more removed from basic hardware operations Library or OS intervention M  MM Network P $ P $ P $

ECE669 L15: Mid-term Review March 25, 2004 Message-Passing Abstraction Send specifies buffer to be transmitted and receiving process Recv specifies sending process and application storage to receive into Memory to memory copy, but need to name processes Optional tag on send and matching rule on receive User process names local data and entities in process/tag space too In simplest form, the send/recv match achieves pairwise synch event -Other variants too Many overheads: copying, buffer management, protection Process PProcessQ Address Y AddressX SendX, Q, t ReceiveY, P, t Match Local process address space Local process address space

ECE669 L15: Mid-term Review March 25, 2004 Simulating Ocean Currents °Model as two-dimensional grids Discretize in space and time finer spatial and temporal resolution => greater accuracy °Many different computations per time step -set up and solve equations Concurrency across and within grid computations °Static and regular (a) Cross sections (b) Spatial discretization of a cross section

ECE669 L15: Mid-term Review March 25, Steps in Creating a Parallel Program °Decomposition of computation in tasks °Assignment of tasks to processes °Orchestration of data access, comm, synch. °Mapping processes to processors

ECE669 L15: Mid-term Review March 25, 2004 Discretize °Time Where °Space °1st Where °2nd Can use other discretizations -Backward -Leap frog Forward difference A 12 A 11 Space Boundary conditions n-2 n-1 n Time  x  1 X grid points

ECE669 L15: Mid-term Review March 25, D Case °Or 0 0 A i n  1  A i n  t  1  x 2 A i  1 n  2A i n  A i-1 n    B i

ECE669 L15: Mid-term Review March 25, 2004 ° Multigrid Basic idea ---> Solve on coarse grid ---> then on fine grid 8, 1 1, 1 8, 8 1, 8 X k+1

ECE669 L15: Mid-term Review March 25, , 1 1, 1 8, 8 1, 8 Basic idea ---> Solve on coarse grid ---> then on fine grid (Note: In practice -- solve for errors in next finer grid. But communication and computation patterns stay the same.) 8, 1 1, 1 8, 8 1, 8 ° Multigrid X k+1 i, j

ECE669 L15: Mid-term Review March 25, 2004 Perimeter to Area comm-to-comp ratio (area to volume in 3-d) Depends on n,p: decreases with n, increases with p Domain Decomposition °Works well for scientific, engineering, graphics,... applications °Exploits local-biased nature of physical problems Information requirements often short-range Or long-range but fall off with distance °Simple example: nearest-neighbor grid computation

ECE669 L15: Mid-term Review March 25, 2004 Domain Decomposition °Comm to comp: for block, for strip °Application dependent: strip may be better in other cases 4*p 0.5 n 2*p n Best domain decomposition depends on information requirements Nearest neighbor example: block versus strip decomposition:

ECE669 L15: Mid-term Review March 25, 2004 Exploiting Temporal Locality Structure algorithm so working sets map well to hierarchy -often techniques to reduce inherent communication do well here -schedule tasks for data reuse once assigned Solver example: blocking

ECE669 L15: Mid-term Review March 25, D Array of nodes for Jacobi N P ops { … Model: 1 op, 1cycle 1comm/hop, 1cycle

ECE669 L15: Mid-term Review March 25, 2004 Scalability ° °Ideal speedup on any number of procs. °Find best P °So, °So, 1-D array is scalable for Jacobi 24 T par  N P  P  T  P  0 P  N T par   N 1 3       T seg  N S R N   N 2 3  N N 1 3  N   N 2 3 N  S R N  S I N   N  N 1 3

ECE669 L15: Mid-term Review March 25, 2004 Detailed Example p  10  6 c  0.1  6 m  0.1  6 P  p c  N P or 10   6  N orN  1000 for balance alsoR M  m N P  m  100  m Memory size of m = 100 yields a balanced machine.

ECE669 L15: Mid-term Review March 25, 2004 Better Algorithm, but basically Branch and Bound °Little et al. °Basic Ideas: Cost matrix

ECE669 L15: Mid-term Review March 25, 2004 Better Algorithm, but basically Branch and Bound Little et al. Notion of reduction: Subtract same value from each row or column

ECE669 L15: Mid-term Review March 25, 2004 Better Algorithm, but basically Branch and Bound Little et al

ECE669 L15: Mid-term Review March 25, 2004 Communication Finite State Machine Each node has a processing part and a communications part Interface to local processor is a FIFO Communication to near- neighbors is pipelined

ECE669 L15: Mid-term Review March 25, 2004 Statically Programmed Communication Data transferred one node in one cycle Inter-processor path may require multiple cycles Heavy arrows represent local transfers Grey arrows represent non-local transfers