Parallel Computer Architectures

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

SE-292 High Performance Computing
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Today’s topics Single processors and the Memory Hierarchy
Multiple Processor Systems
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
UMA Bus-Based SMP Architectures
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
Chapter 17 Parallel Processing.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 1 Introduction Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Multiple Processor Systems 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Computer System Architectures Computer System Software
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
1 Multiple Processors, A Network, An OS, and Middleware Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Cotter-cs431 Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Chapter 8 Multiple Processor Systems.
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Overview Parallel Processing Pipelining
Parallel Architecture
MODERN OPERATING SYSTEMS Third Edition ANDREW S
The University of Adelaide, School of Computer Science
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Multiple Processor Systems
Multiple Processor Systems
Multiple Processor and Distributed Systems
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Multiprocessor System Interconnects
Presentation transcript:

Parallel Computer Architectures Chapter 8 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer. (e) A grid. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Instruction-Level Parallelism (a) A CPU pipeline. (b) A sequence of VLIW instructions. (c) An instruction stream with bundles marked. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The TriMedia VLIW CPU (1) A typical TriMedia instruction, showing five possible operations. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The TriMedia VLIW CPU (2) The TM3260 functional units, their quantity, latency, and which instruction slots they can use. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The TriMedia VLIW CPU (3) The major groups of TriMedia custom operations. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The TriMedia VLIW CPU (4) (a) An array of 8-bit elements. (b) The transposed array. (c) The original array fetched into four registers. (d) The transposed array in four registers. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

On-Chip Multithreading (1) (a) – (c) Three threads. The empty boxes indicated that the thread has stalled waiting for memory. (d) Fine-grained multithreading. (e) Coarse-grained multithreading. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

On-Chip Multithreading (2) Multithreading with a dual-issue superscalar CPU. (a) Fine-grained multithreading. (b) Coarse-grained multithreading. (c) Simultaneous multithreading. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Hyperthreading on the Pentium 4 Resource sharing between threads in the Pentium 4 NetBurst microarchitecture. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Homogeneous Multiprocessors on a Chip Single-chip multiprocessors. (a) A dual-pipeline chip. (b) A chip with two cores. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Heterogeneous Multiprocessors on a Chip (1) The logical structure of a simple DVD player contains a heterogeneous multiprocessor containing multiple cores for different functions. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Heterogeneous Multiprocessors on a Chip (2) An example of the IBM CoreConnect architecture. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Introduction to Networking (1) How users are connected to servers on the Internet. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Introduction to Networking (2) A packet as it appears on the Ethernet. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Introduction to Network Processors A typical network processor board and chip. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The Nexperia Media Processor The Nexperia heterogeneous multiprocessor on a chip. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Multiprocessors (a) A multiprocessor with 16 CPUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different CPU. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Multicomputers (1) (a) A multicomputer with 16 CPUs, each with its own private memory. (b) The bit-map image of Fig. 8-17 split up among the 16 memories. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Multicomputers (2) Various layers where shared memory can be implemented. (a) The hardware. (b) The operating system. (c) The language runtime system. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Taxonomy of Parallel Computers (1) Flynn’s taxonomy of parallel computers. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Taxonomy of Parallel Computers (2) A taxonomy of parallel computers. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Sequential Consistency (a) Two CPUs writing and two CPUs reading a common memory word. (b) - (d) Three possible ways the two writes and four reads might be interleaved in time. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Weak Consistency Weakly consistent memory uses synchronization operations to divide time into sequential epochs. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

UMA Symmetric Multiprocessor Architectures Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Snooping Caches The write through cache coherence protocol. The empty boxes indicate that no action is taken. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The MESI Cache Coherence Protocol Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

UMA Multiprocessors Using Crossbar Switches (a) An 8 × 8 crossbar switch. (b) An open crosspoint. (c) A closed crosspoint. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

UMA Multiprocessors Using Multistage Switching Networks (1) (a) A 2 × 2 switch. (b) A message format. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

UMA Multiprocessors Using Multistage Switching Networks (2) An omega switching network. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

NUMA Multiprocessors A NUMA machine based on two levels of buses. The Cm* was the first multiprocessor to use this design. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Cache Coherent NUMA Multiprocessors (a) A 256-node directory-based multiprocessor. (b) Division of a 32-bit memory address into fields. (c) The directory at node 36. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The Sun Fire E25K NUMA Multiprocessor (1) The Sun Microsystems E25K multiprocessor. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The Sun Fire E25K NUMA Multiprocessor (2) The SunFire E25K uses a four-level interconnect. Dashed lines are address paths. Solid lines are data paths. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Message-Passing Multicomputers A generic multicomputer. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Topology Various topologies. The heavy dots represent switches. The CPUs and memories are not shown. (a) A star. (b) A complete interconnect. (c) A tree. (d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The BlueGene/L custom processor chip. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The BlueGene/L. (a) Chip. (b) Card. (c) Board. (d) Cabinet. (e) System. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Packaging of the Red Storm components. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

The Red Storm system as viewed from above. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

A Comparison of BlueGene/L and Red Storm Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Processing of a Google query. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Google (2) A typical Google cluster. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Scheduling Scheduling a cluster. (a) FIFO. (b) Without head-of-line blocking. (c) Tiling. The shaded areas indicate idle CPUs. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Distributed Shared Memory (1) A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. (a) The initial situation. …. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Distributed Shared Memory (2) A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. … (b) After CPU 0 references page 10. … Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Distributed Shared Memory (3) A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. … (c) After CPU 1 references page 10, here assumed to be a read-only page. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Linda Three Linda tuples. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

A simplified ORCA stack object, with internal data and two operations. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Software Metrics (1) Real programs achieve less than the perfect speedup indicated by the dotted line. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Software Metrics (2) (a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Achieving High Performance (a) A 4-CPU bus-based system. (b) A 16-CPU bus-based system. (c) A 4-CPU grid-based system. (d) A 16-CPU grid-based system. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Grid Computing The grid layers. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0