Today’s topics Single processors and the Memory Hierarchy

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Jie Liu, Ph.D. Professor Department of Computer Science
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

Parallel Computing Platforms
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Parallel Computer Architectures
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Outline Why this subject? What is High Performance Computing?
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Parallel Computing Presented by Justin Reschke
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Parallel Architecture
Distributed and Parallel Processing
Multiprocessor Systems
buses, crossing switch, multistage network.
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Outline Interconnection networks Processor arrays Multiprocessors
buses, crossing switch, multistage network.
AN INTRODUCTION ON PARALLEL PROCESSING
Chapter 4 Multiprocessors
Presentation transcript:

Today’s topics Single processors and the Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies Multiprocessors Multicomputers Flynn’s Taxonomy Modern clusters – hybrid

Processors and the Memory Hierarchy Registers (1 clock cycle, 100s of bytes) 1st level cache (3-5 clock cycles, 100s KBytes) 2nd level cache (~10 clock cycles, MBytes) Main memory (~100 clock cycles, GBytes) Disk (milliseconds, 100GB to gianormous) CPU registers 1st level Instructions 1st level Data 2nd Level unified (Instructions & Data)

IBM Dual Core From Intel® 64 and IA-32 Architectures Optimization Reference Manual http://www.intel.com/design/processor/manuals/248966.pdf

Interconnection Network Topologies - Bus A single shared data path Pros Simplicity cache coherence synchronization Cons fixed bandwidth Does not scale well Global Memory CPU CPU CPU

Interconnection Network Topologies – Switch based mxn switches Many possible topologies Characterized by Diameter Worst case number of switches between two processors Impacts latency Bisection width Minimum number of connections that must be removed to split the network into two Communication bandwidth limitation Edges per switch Best if this is independent of the size of the network CPU

Interconnection Network Topologies - Mesh 2-D Mesh 2-D array of processors Torus/Wraparound Mesh Processors on edge of mesh are connected Characteristics (n nodes) Diameter = or Bisection width = Switch size = 4 Number of switches = n

Interconnection Network Topologies - Hypercube A d-dimensional hypercube has n=2d processors. Each processor directly connected to d other processors Shortest path between a pair of processors is at most d Characteristics (n=2d nodes) Diameter = d Bisection width = n/2 Switch size = d Number of switches = n 3-D Hypercube 4-D Hypercube

Multistage Networks Butterfly Omega Perfect shuffle Characteristics for an Omega network (n=2d nodes) Diameter = d-1 Bisection width = n/2 Switch size = 2 Number of switches = dn/2 An 8-input, 8-output Omega network of 2x2 switches

Interconnection Network Shared Memory One or more memories Global address space (all system memory visible to all processors) Transfer of data between processors is usually implicit, just read (write) to (from) a given address (OpenMP) Cache-coherency protocol to maintain consistency between processors. (UMA) Uniform-memory-access Shared-memory System Memory Memory Memory Interconnection Network CPU CPU CPU

Distributed Shared Memory Single address space with implicit communication Hardware support for read/write to non-local memories, cache coherency Latency for a memory operation is greater when accessing non local data than when accessing date within a CPU’s own memory (NUMA)Non-Uniform-memory-access Shared-memory System Interconnection Network Memory CPU

Interconnection Network Distributed Memory Each processor has access to its own memory only Data transfer between processors is explicit, user calls message passing functions Common Libraries for message passing MPI, PVM User has complete control/responsibility for data placement and management Interconnection Network Memory CPU

Interconnection Network Hybrid Systems Distributed memory system with multiprocessor shared memory nodes. Most common architecture for current generation of parallel machines Interconnection Network CPU Memory Network Interface CPU Memory Network Interface Network Interface CPU CPU Memory CPU

Flynn’s Taxonomy (figure 2.20 from Quinn) Data stream Single Multiple SISD Uniprocessor SIMD Procesor arrays Pipelined vector processors MISD Systolic array MIMD Multiprocessors Multicomputers Single Instruction stream Multiple

Top 500 List Some highlights from http://www.top500.org/ On the new list, the IBM BlueGene/L system, installed at DOE’s Lawrence Livermore National Laboratory (LLNL), retains the No. 1 spot with a Linpack performance of 280.6 teraflops (trillions of calculations per second, or Tflop/s). The new No. 2 systems is Sandia National Laboratories’ Cray Red Storm supercomputer, only the second system ever to be recorded to exceed the 100 Tflops/s mark with 101.4 Tflops/s. The initial Red Storm system was ranked No. 9 in the last listing. Slipping to No. 3 from No. 2 last June is the IBM eServer Blue Gene Solution system, installed at IBM’s Thomas Watson Research Center with 91.20 Tflops/s Linpack performance. The new No. 5 is the largest system in Europe, an IBM JS21 cluster installed at the Barcelona Supercomputing Center. The system reached 62.63 Tflops/s.

Linux/Beowulf cluster basics Goal Get super computing processing power at the cost of a few PCs How Commodity components: PCs and networks Free software with open source

CPU nodes A typical configuration Dual socket Dual core AMD or Intel nodes 4 GB memory per node

Network Options From D.K. Panda’s Nowlab website at Ohio State, http://nowlab.cse.ohio-state.edu/ Research Overview presentation

Challenges Cooling Power constraints Reliability System Administration